Introduce node rebroadcast module (p2p)

https://github.com/bitcoin/bitcoin/pull/21061

Host: glozow  -  PR author: amitiuttarwar

The PR branch HEAD was 038f751 at the time of this review club meeting.

Notes

  • Hiding links between wallet addresses and IP addresses is a key part of Bitcoin privacy. Many techniques exist to help users obfuscate their IP address when submitting their own transactions, and various P2P changes have been proposed with the goal of hiding transaction origins.

  • Beyond initial broadcast, rebroadcast behavior can also leak information. If a node rebroadcasts its own wallet transactions differently from transactions received from its peers, adversaries can use this information to infer transaction origins even if the initial broadcast revealed nothing. We have discussed rebroadcast in previous review clubs, #16698 and #18038.

  • The rebroadcast project’s goal is to improve privacy by making node rebroadcast behavior for wallet transactions indistinguishable from that of other peers’ transactions.

  • #21061 adds a TxRebroadcast module responsible for selecting transactions to be rebroadcast and keeping track of how many times each transaction has been rebroadcast. After each block, the module uses the miner and other heuristics to select transactions from the mempool that it believes “should” have been included in the block and reannounces them (disabled by default for now).

  • Rebroadcasts happen once per new block. The set of transactions to be rebroadcast is calculated as follows:

    • The node regularly estimates the minimum feerate for transactions to be included in the next block, m_cached_fee_rate.

    • When a new block arrives, the transactions included in the block are removed from the mempool. The node then uses BlockAssembler to calculate which transactions (with a total weight up to 3/4 of the block maximum) from the mempool are more than 30 minutes old and have a minimum feerate of m_cached_fee_rate. This results in a set of transactions that our node would have included in the last block.

    • The rebroadcast attempt tracker, m_attempt_tracker, tracks how many times and how recently we’ve attempted to rebroadcast a transaction so that we don’t spam the network with re-announcements.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. In what scenarios might a user want to rebroadcast their transaction? Why shouldn’t each wallet just be solely responsible for rebroadcasting its own transactions?

  3. How does the rebroadcast module decide which transactions to rebroadcast (TxRebroadcastHandler::GetRebroadcastTransactions())?

  4. In what scenarios would a miner include different transactions from our BlockAssembler? More specifically, when might the miner exclude a transaction, and when might it include a transaction yours doesn’t?

  5. Why might we want to keep a transaction in our rebroadcast attempt tracker even after removing it from our mempool? (Hint: what happens if we expire a transaction from our mempool and then our peer rebroadcasts it to us? When might this happen?)

  6. When should we remove transactions from our rebroadcast attempt tracker? How does the code ensure that the tracker doesn’t grow unbounded?

  7. How is the estimated minimum feerate for inclusion in a block, m_cached_fee_rate, calculated? Why not just calculate the feerate of the lowest-feerate transaction in the most recently mined block?

Meeting Log

  119:00 <glozow> #startmeeting
  219:00 <jnewbery> hi!
  319:00 <_0x0ff> hi
  419:00 <b10c> hi
  519:00 <schmidty> hi!
  619:00 <ccdle12> hi
  719:00 <emzy> hi
  819:00 <glozow> Hey everybody! Welcome to PR Review Club :D
  919:00 <stickrobot> first time here
 1019:00 <svav> hi
 1119:00 <amiti> hi
 1219:00 <glozow> Today we're looking at #21061, introduce rebroadcast module
 1319:00 <ecola> hi
 1419:00 <lightlike> hi
 1519:00 <glozow> Notes and questions: https://bitcoincore.reviews/21061
 1619:00 <glozow> Welcome stickrobot!
 1719:00 <glozow> any other first timers?
 1819:00 <jnewbery> hi stickrobot. Welcome!
 1919:00 <_0x0ff> welcome stickrobot
 2019:01 <stickrobot> thanks all
 2119:01 <glozow> Have y'all had a chance to review the PR? y/n
 2219:01 <b10c> y
 2319:01 <ccdle12> n :(
 2419:01 <_0x0ff> y
 2519:01 <emzy> n
 2619:01 <stickrobot> n
 2719:01 <ivanacostarubio> n :(
 2819:01 <svav> y
 2919:02 <jnewbery> y
 3019:02 <lightlike> y
 3119:02 <glozow> Would someone like to summarize what the PR is doing for those who haven't had a chance to review it? :)
 3219:03 <sishir> All nodes (instead of wallet) will rebroadcast tx that should have bee confirmed by now
 3319:03 <svav> It is increasing security by changing rebroadcast message contents, so IP address cannot be linked with wallet address
 3419:03 <sishir> *been
 3519:03 <glozow> sishir: svav: yes! updating rebroadcast for the sake of improving privacy
 3619:03 <glozow> okay let's start with some conceptual questions. In what scenarios might a user want to rebroadcast their transaction? Why
 3719:03 <glozow> shouldn't each wallet just be solely responsible for rebroadcasting its own
 3819:03 <glozow> transactions?
 3919:04 <b10c> When his tx didn't propagate properly or he assumes that the network has forgotten about it
 4019:04 <sishir> nah cause there is privacy leak when they try to rebroadcast
 4119:04 <svav> Will rebroadcast if a node thinks transaction should have been processed but wasn't
 4219:05 <cls> rebroadcast opens one up to privacy leakage
 4319:05 <b10c> and the user's wallet might not always be online to rebroadcast
 4419:05 <glozow> b10c: correct, sometimes tx propagation just doesn't work properly. why would the network forget about a tx that was once in their mempools?
 4519:05 <sishir> Q. Why do nodes only ever rebroadcast acast their own tx tho?
 4619:06 <sishir> *rebroadcast
 4719:06 <glozow> sishir: svav: cls: b10c: yes! what can a spy node do to deanon transactions?
 4819:06 <glozow> sishir: that's just what the legacy behavior is
 4919:06 <sishir> spy node can infer that the node is the source wallet and execute dust attack
 5019:06 <svav> A spy node can compare rebroadcasts from all nodes and identify differences, thus associate an IP address with a wallet address
 5119:07 <_0x0ff> sishir: new implementation rebroadcasts any transaction not just their own
 5219:07 <glozow> sorry just to clarify, i mean what can a spy node do right now, assuming nodes don't have the changes from this PR
 5319:07 <b10c> transactions expire after 14 days, can get size-limited if more higher fee trasnactions are there - or can be removed in a block which is later reorged (the reorged chain does not contain the transaction)
 5419:07 <_0x0ff> associate an address with an ip
 5519:07 <cls> I believe an adversary will be able to link transaction to a users wallet/publick address
 5619:07 <glozow> b10c: yes exactly
 5719:08 <sishir> glozow _0x0ff I see Thank you
 5819:08 <glozow> yep, with current rebroadcast behavior, any node that announces a tx more than once -> the tx is from their wallet
 5919:09 <amiti> sishir: you mentioned dust attack, can you describe the attack?
 6019:09 <jnewbery> sishir: A dust attack is something different. It doesn't require knowing the target's network address
 6119:09 <glozow> ok cool! let's dive into the PR. How does the rebroadcast module decide which transactions to rebroadcast
 6219:09 <glozow> (`TxRebroadcastHandler::GetRebroadcastTransactions()`)?
 6319:10 <_0x0ff> It rebrodcasts tx that are: older than 30min, txfee > m_cached_fee_rate (calcualted via BlockAssembler), hasnt been rebroadcasted >= MAX_REBROADCAST_COUNT (6) and wasn't rebroadcasted in the last MIN_REATTEMPT_INTERVAL (4h).
 6419:10 <glozow> Code here: https://github.com/bitcoin/bitcoin/pull/21061/files#diff-7dff50848db96bdb8edffc4d21daeca6d9050ec0e67d96072780ea5751e7df06R33
 6519:10 <svav> A dusting attack is an attack in which a trace amount of cryptocurrency, called dust, is sent to a large number of wallet addresses with the purpose of "un-masking" or de-anonymizing the addresses. Dusting attacks are tactics utilized by both criminals and law enforcement agencies.
 6619:10 <glozow> _0x0ff: yes! very prepared :D
 6719:10 <sishir> Yessir! Dust attack is when attacker sends some btc (dusts) to various addresses and observes the wallet rebroadcasting behavior
 6819:11 <glozow> is it possible for `GetRebroadcastTransactions` to return 0 transactions?
 6919:11 <_0x0ff> glozow: hehe, i try ;P
 7019:11 <glozow> also, is it possible for `GetRebroadcastTransactions` to return more than a block's worth of transactions?
 7119:11 <_0x0ff> it is i possible to return 0 txs from what I gather, eg when mempool is empty
 7219:12 <glozow> not just when the mempool is empty! :)
 7319:12 <_0x0ff> but it's not possible to return then that a more txs that would fit the block, i think it only returns 3/4th of txs that fit the block
 7419:12 <b10c> more than a block's worth is not possible
 7519:12 <glozow> _0x0ff: b10c: correct, it would never return more than 3/4 of the maximum block weight
 7619:13 <_0x0ff> what is the other case that would return 0 txs?
 7719:13 <glozow> it could return fewer though, if the mempool just doesn't have many transactions that fit the criteria _0x0ff mentioned
 7819:13 <glozow> the filters are applied within the assembler
 7919:14 <glozow> does that make sense?
 8019:14 <_0x0ff> yup
 8119:14 <glozow> coolio
 8219:14 <cls> yes, nice explaination
 8319:14 <glozow> Moving on: In what scenarios would a miner include different transactions from our
 8419:14 <glozow> `BlockAssembler`? More specifically, when might the miner exclude a
 8519:14 <glozow> transaction, and when might it include a transaction yours doesn't?
 8619:14 <glozow> I can think of 3 in each category :)
 8719:15 <b10c> 1. a transaction didn't propagte to us or the miner yet
 8819:15 <_0x0ff> If the scenario when miner prioritizes different transactions from ours.
 8919:15 <marqusat> When a given tx does not reach a miner before they start mining the block or when they censor some transactions.
 9019:15 <b10c> 2. the miner manually prioritized the transaction
 9119:15 <b10c> 3. we and the miner have a conflicting transaction in our mempools
 9219:15 <_0x0ff> miner could also censor a tx
 9319:15 <b10c> 4. one party has a RBF replacement transaction which the other party doesn't have yet (similar to 1. and 3.)
 9419:15 <b10c> 5. the miner mines an emtpy block
 9519:15 <b10c> 6. censorship
 9619:15 <glozow> yaaaas all good answers
 9719:16 <cls> minor does not prioritize due to low transaction fee
 9819:16 <_0x0ff> good answers b10c :)
 9919:17 <amiti> here's a scenario where the filters wouldn't return any txns to rebroadcast: at time 0, the fee rate cache runs, identifies min fee rate. at time 1 a block comes in and picks up all our mempool txns above this fee rate. when we go to connect the tip, we don't have any remaining txns above the calculated min fee rate.
10019:17 <b10c> amiti: good point
10119:17 <_0x0ff> ha, good one
10219:18 <glozow> amiti: right. and all the high-fee transactions that might have arrived in the meantime would not meet the 30 minute recency filter
10319:18 <amiti> glozow: yeah, good point :)
10419:19 <glozow> Ok! so what does the rebroadcast attempt tracker do?
10519:20 <sishir> keeps track of the # of rebroadcast attempt
10619:20 <_0x0ff> it tracks how many times we've rebroadcasted a tx and what was the last time we rebrodcasted it
10719:20 <svav> tracks how many times and how recently we’ve attempted to rebroadcast a transaction so that we don’t spam the network with re-announcements.
10819:20 <glozow> awesome, yes, svav: nice wording
10919:21 <glozow> And Why might we want to keep a transaction in our rebroadcast attempt tracker even after removing it from our mempool?
11019:21 <_0x0ff> and it prevents that network doesnt ddos itself with rebroadcasts
11119:21 <glozow> Hint: what happens if we expire a transaction from our mempool and then our peer rebroadcasts it to us? When might this happen?
11219:22 <_0x0ff> no clue about this one
11319:22 <glozow> This part was really confusing for me - feel free to guess and ask questions
11419:22 <sishir> I thought we remove them
11519:23 <_0x0ff> well, i dont see a reason why deal with removing the tx given it will get expired or removed (when 500 limit is reached)
11619:23 <glozow> mempool expiry is 2 weeks, while the attempt tracker expiry is ~3 months. why aren't they the same? -> there must be some reasons why we'd keep a tx in the attempt tracker after the mempool has forgotten about them
11719:23 <cls> Each node may have a different state in a decentralized network
11819:23 <_0x0ff> oh, if fees get high, and some txs get removed from mempool
11919:24 <glozow> _0x0ff: after you remove from mempool, what happens if i rebroadcast to you?
12019:24 <_0x0ff> hm but no, the BlockAssembler only gets txs from mempool
12119:24 <_0x0ff> it will be added back to mempool
12219:24 <glozow> (i still have the tx for whatever reason)
12319:24 <glozow> sure, it gets added back to mempool
12419:24 <svav> A peer might have a lower minimum fee rate that us, so they won't exclude it like us
12519:24 <glozow> is it possible this transaction will _never_ get mined?
12619:24 <glozow> can it be consensus-invalid? or policy-invalid?
12719:25 <glozow> beyond just fees
12819:25 <larryruane_> no because in that case it wouldn't have entered the mempool in the first place
12919:25 <glozow> larryruane_: is it possible that the rest of the network has policy rules that we don't know about?
13019:26 <glozow> let's say we're version 22 nodes with rebroadcast implemented
13119:26 <b10c> not consensus invalid - but policy-invalid can happen if we don't support e.g. a softfork
13219:26 <glozow> version 24 nodes have a new policy for version 2 witnesses, for example
13319:26 <b10c> wait can it be consensus invalid?
13419:26 <glozow> b10c: are there nodes right now that don't know about some consensus rules? :)
13519:27 <b10c> sure, I there are nodes that don't know about e.g SegWit
13619:27 <glozow> b10c: exactly
13719:27 <b10c> I think e.g. forkmonitor runs a 0.10.x Bitcoin Core node
13819:28 <glozow> heh. so, what happens if there are nodes rebroadcasting transactions that don't meet new consensus rules?
13919:30 <b10c> hm
14019:30 <b10c> not sure
14119:30 <_0x0ff> same, no idea
14219:30 <jnewbery> let's try to figure it out!
14319:30 <glozow> b10c: what will updated nodes do? (the ones that know about the new consensus rules)
14419:30 <glozow> and _0x0ff
14519:30 <b10c> reject the transaction
14619:31 <glozow> b10c: correct
14719:31 <glozow> and what will old nodes do?
14819:31 <_0x0ff> accept it, and keep on rebroadcasting the tx
14919:31 <glozow> _0x0ff: correct!
15019:31 <glozow> so what happens if 2 of these old nodes are connected to each other?
15119:32 <_0x0ff> they will keep sending the tx between each other so it will never expire (until expiry conditions are met)
15219:32 <glozow> _0x0ff: correct
15319:32 <b10c> need to be more than 2 nodes with the current filter, right?
15419:32 <glozow> let's say the tx is removed from the rebroadcast attempt tracker as soon as it expires from mempool
15519:32 <b10c> 4h * 4 attempts < 14 days
15619:33 <glozow> will these 2 nodes ever forget about the tx?
15719:33 <sipa_> the last time a consensus rule change was introduced that changed something that wasn't already very widespread nonstandard was BIP113 i believe
15819:33 <sipa_> (just to give some context)
15919:33 <glozow> sipa_: right. i think this applies to new policy changes as well, though
16019:33 <sipa_> glozow: it does
16119:34 <sipa_> though i'm not sure when the last time was that policy was restricted
16219:34 <b10c> fwiw: BIP 113: Median time-past as endpoint for lock-time calculations https://github.com/bitcoin/bips/blob/master/bip-0113.mediawiki
16319:36 <glozow> soooo, should we remove a tx from rebroadcast attempt tracker as soon as we remove it from mempool?
16419:37 <b10c> based on your question I don't think the 2 nodes would ever forget about the tx, but tbh my train of though got a bit lost
16519:37 <sishir> wait so the nodes will not forget about the tx?
16619:37 <sishir> im a lil confused
16719:37 <b10c> glozow: no
16819:37 <svav> glozow: no
16919:37 <_0x0ff> no, we shouldnt remove it - so the tx woudl get expired faster
17019:37 <b10c> never* ^
17119:38 <glozow> b10c: svav: _0x0ff: correcto
17219:38 <sishir> isn't the tx already confirmed, mined and in the blockchain? So, we still want to keep a copy of it in rebroadcast attempt?
17319:39 <jnewbery> sishir: imagine both node A and node B have not upgraded and both consider the tx valid. If there's no way to prevent rebroadcasting, they'll just continue to rebroadcast the tx to each other.
17419:39 <lightlike> though if you send >500 of these txes, they would still never forget even with the tracker, so I'm thinking the tracker only helps when this unintentional, not in an attack case or does it?
17519:39 <jnewbery> sishir: this is only for unconfirmed transactions.
17619:39 <_0x0ff> besides the the limit (how old tx we keep) in m_attempt_tracker there's also a size limit of 500
17719:39 <sishir> Ahhh i see
17819:40 <svav> You need to keep it in your rebroadcast attempt tracker so you don't rebroadcast it too many times
17919:40 <glozow> lightlike: yeah, but i think you could get old nodes to keep talking about newly invalid transactions regardless
18019:41 <glozow> svav: right, so let's see how the 3month expiry helps. let's say you expire a tx from mempool after 2 weeks, and you keep it in your rebroadcast attempt tracker. what happens if you see the tx again?
18119:42 <glozow> (let's assume you don't exceed the 500 limit in this situation)
18219:43 <svav> You would only rebroadcast it a maximum of 6 times within the 3 months
18319:44 <_0x0ff> it will folow the same rules as it did before it got removed, which means we might not rebroadcast it immidiatelly after receiving it (if conditions for rebrodcasting arent met)
18419:44 <glozow> svav: yup! so assuming the 2 old nodes keep it in their rebroadcast attempt trackers for 3 months, will they eventually forget about the tx?
18519:45 <_0x0ff> yes :)
18619:45 <glozow> _0x0ff: :)
18719:45 <svav> yes when they have each rebroadcasted it 6 times
18819:45 <glozow> svav: correct, after 6 times each
18919:46 <glozow> so this helps, as long as they don't reach the 500 maximum
19019:46 <glozow> amiti: have you considered increasing the limit? or keeping a separate tracker for expired-from-mempool transactions?
19119:47 <amiti> glozow: yeah the limit is slightly arbitrary right now, just to pick a starting point. I think the most relevant will be observing the mechanism out in the wild & seeing if this limit is useful
19219:47 <glozow> hopefully we feel comfortable moving to the next question? In general, when should we remove transactions from our rebroadcast attempt tracker?
19319:47 <_0x0ff> i also saw a comment about persisting m_attempt_tracker to disk - do we think that's worthy to have?
19419:47 <glozow> amiti: makes sense to me
19519:47 <amiti> also, if the network is working as expected, we shouldn't be rebroadcasting txns heavily
19619:48 <b10c> amiti: especially if many nodes rebroadcast with this patch
19719:48 <amiti> =P
19819:50 <jnewbery> another potential change could be to move the txids into a rolling filter if they reach MAX_REBROADCAST_COUNT, since at that point we only need a test for inclusion
19919:50 <sishir> I  conflicting tx in the block & tx that gets taken out of mempool cause of RBF
20019:50 <glozow> sishir: yeah, those are good ones
20119:50 <cls> might be worth looking a dynamic algorithms such as TCP retransmission which slowly degrades over time
20219:51 <glozow> jnewbery: ooooooh
20319:51 <glozow> or a cuckoo cache 🐦
20419:52 <amiti> jnewbery: are you suggesting replace attempt tracker with rolling bloom filter? or having an additional?
20519:52 <glozow> anyone else have ideas for when we should remove from rebroadcast attempt tracker?
20619:52 <amiti> cls: yeah I considered that sort of design, but it feels overkill for the use case
20719:52 <jnewbery> potentially having an additional, but I'm just throwing an idea out. It might not be good!
20819:52 <svav> when it's confirmed?
20919:53 <amiti> jnewbery: gotcha :)
21019:53 <glozow> svav: yep! that's a big one
21119:53 <svav> when it expires?
21219:53 <cls> amiti: totally makes sense
21319:53 <glozow> are there any other cases, beyond seeing a conflict in a block, where the tx is guaranteed to be invalid?
21419:54 <glozow> svav: expires from where?
21519:54 <sipa_> glozow: did you know there is an awesome way of combing cuckoo tables with (tolling) bloom filters? :)
21619:54 <sipa_> *fombining
21719:54 <glozow> glozow: whaaaa?!
21819:54 <glozow> sipa_*
21919:54 <sipa_> **combining
22019:54 <glozow> HAH i tagged myself 😂
22119:54 <sipa_> look up cuckoo filter
22219:55 <sishir> gotta head out but thank you glozow. learned a lot
22319:56 <sipa_> i did some work on creating an efficient rolling cuckoo filter, but put it aside with a bit higher priority things
22419:56 <glozow> sishir: thanks for coming!
22519:56 <glozow> sipa_ greatest crossover event
22619:57 <glozow> ok I think we have time to do part of the last question: How is the estimated minimum feerate for inclusion in a block, `m_cached_fee_rate`, calculated?
22719:57 <_0x0ff> It uses `BlockAssembler::minTxFeeRate()` which calculates a min fee that would still be included in the next mined block. This approach is better because it calculates fees based on the future mintxfee and not the past.
22819:57 <svav> Is it something to do with MAX_ENTRY_AGE???
22919:58 <glozow> _0x0ff: correct, assemble a block and get the min fee
23019:58 <glozow> when do we do this?
23119:59 <glozow> svav: er, i don't think?
23219:59 <svav> glozow: Sorry, this was in relation to the previous question, but we can move on
23319:59 <glozow> svav: okie gotcha
23420:00 <glozow> soooo we calculate `m_cached_fee_rate` every 1 minute
23520:00 <glozow> that's all we have time for heh
23620:00 <glozow> #endmeeting