Introduce node rebroadcast module (p2p)

Apr 7, 2021

https://github.com/bitcoin/bitcoin/pull/21061

Host: glozow - PR author: amitiuttarwar

The PR branch HEAD was 038f751 at the time of this review club meeting.

Notes

Hiding links between wallet addresses and IP addresses is a key part of Bitcoin privacy. Many techniques exist to help users obfuscate their IP address when submitting their own transactions, and various P2P changes have been proposed with the goal of hiding transaction origins.
Beyond initial broadcast, rebroadcast behavior can also leak information. If a node rebroadcasts its own wallet transactions differently from transactions received from its peers, adversaries can use this information to infer transaction origins even if the initial broadcast revealed nothing. We have discussed rebroadcast in previous review clubs, #16698 and #18038.
The rebroadcast project’s goal is to improve privacy by making node rebroadcast behavior for wallet transactions indistinguishable from that of other peers’ transactions.
#21061 adds a TxRebroadcast module responsible for selecting transactions to be rebroadcast and keeping track of how many times each transaction has been rebroadcast. After each block, the module uses the miner and other heuristics to select transactions from the mempool that it believes “should” have been included in the block and reannounces them (disabled by default for now).
Rebroadcasts happen once per new block. The set of transactions to be rebroadcast is calculated as follows:
- The node regularly estimates the minimum feerate for transactions to be included in the next block, m_cached_fee_rate.
- When a new block arrives, the transactions included in the block are removed from the mempool. The node then uses BlockAssembler to calculate which transactions (with a total weight up to 3/4 of the block maximum) from the mempool are more than 30 minutes old and have a minimum feerate of m_cached_fee_rate. This results in a set of transactions that our node would have included in the last block.
- The rebroadcast attempt tracker, m_attempt_tracker, tracks how many times and how recently we’ve attempted to rebroadcast a transaction so that we don’t spam the network with re-announcements.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
In what scenarios might a user want to rebroadcast their transaction? Why shouldn’t each wallet just be solely responsible for rebroadcasting its own transactions?
How does the rebroadcast module decide which transactions to rebroadcast (TxRebroadcastHandler::GetRebroadcastTransactions())?
In what scenarios would a miner include different transactions from our BlockAssembler? More specifically, when might the miner exclude a transaction, and when might it include a transaction yours doesn’t?
Why might we want to keep a transaction in our rebroadcast attempt tracker even after removing it from our mempool? (Hint: what happens if we expire a transaction from our mempool and then our peer rebroadcasts it to us? When might this happen?)
When should we remove transactions from our rebroadcast attempt tracker? How does the code ensure that the tracker doesn’t grow unbounded?
How is the estimated minimum feerate for inclusion in a block, m_cached_fee_rate, calculated? Why not just calculate the feerate of the lowest-feerate transaction in the most recently mined block?

Meeting Log

19:00

<glozow> #startmeeting

19:00

<jnewbery> hi!

19:00

<_0x0ff> hi

19:00

<b10c> hi

19:00

<schmidty> hi!

19:00

<ccdle12> hi

19:00

<emzy> hi

19:00

<glozow> Hey everybody! Welcome to PR Review Club :D

19:00

<stickrobot> first time here

19:00

<svav> hi

19:00

<amiti> hi

19:00

<glozow> Today we're looking at #21061, introduce rebroadcast module

19:00

<ecola> hi

19:00

<lightlike> hi

19:00

<glozow> Notes and questions: https://bitcoincore.reviews/21061

19:00

<glozow> Welcome stickrobot!

19:00

<glozow> any other first timers?

19:00

<jnewbery> hi stickrobot. Welcome!

19:00

<_0x0ff> welcome stickrobot

19:01

<stickrobot> thanks all

19:01

<glozow> Have y'all had a chance to review the PR? y/n

19:01

<b10c> y

19:01

<ccdle12> n :(

19:01

<_0x0ff> y

19:01

<emzy> n

19:01

<stickrobot> n

19:01

<ivanacostarubio> n :(

19:01

<svav> y

19:02

<jnewbery> y

19:02

<lightlike> y

19:02

<glozow> Would someone like to summarize what the PR is doing for those who haven't had a chance to review it? :)

19:03

<sishir> All nodes (instead of wallet) will rebroadcast tx that should have bee confirmed by now

19:03

<svav> It is increasing security by changing rebroadcast message contents, so IP address cannot be linked with wallet address

19:03

<sishir> *been

19:03

<glozow> sishir: svav: yes! updating rebroadcast for the sake of improving privacy

19:03

<glozow> okay let's start with some conceptual questions. In what scenarios might a user want to rebroadcast their transaction? Why

19:03

<glozow> shouldn't each wallet just be solely responsible for rebroadcasting its own

19:03

<glozow> transactions?

19:04

<b10c> When his tx didn't propagate properly or he assumes that the network has forgotten about it

19:04

<sishir> nah cause there is privacy leak when they try to rebroadcast

19:04

<svav> Will rebroadcast if a node thinks transaction should have been processed but wasn't

19:05

<cls> rebroadcast opens one up to privacy leakage

19:05

<b10c> and the user's wallet might not always be online to rebroadcast

19:05

<glozow> b10c: correct, sometimes tx propagation just doesn't work properly. why would the network forget about a tx that was once in their mempools?

19:05

<sishir> Q. Why do nodes only ever rebroadcast acast their own tx tho?

19:06

<sishir> *rebroadcast

19:06

<glozow> sishir: svav: cls: b10c: yes! what can a spy node do to deanon transactions?

19:06

<glozow> sishir: that's just what the legacy behavior is

19:06

<sishir> spy node can infer that the node is the source wallet and execute dust attack

19:06

<svav> A spy node can compare rebroadcasts from all nodes and identify differences, thus associate an IP address with a wallet address

19:07

<_0x0ff> sishir: new implementation rebroadcasts any transaction not just their own

19:07

<glozow> sorry just to clarify, i mean what can a spy node do right now, assuming nodes don't have the changes from this PR

19:07

<b10c> transactions expire after 14 days, can get size-limited if more higher fee trasnactions are there - or can be removed in a block which is later reorged (the reorged chain does not contain the transaction)

19:07

<_0x0ff> associate an address with an ip

19:07

<cls> I believe an adversary will be able to link transaction to a users wallet/publick address

19:07

<glozow> b10c: yes exactly

19:08

<sishir> glozow _0x0ff I see Thank you

19:08

<glozow> yep, with current rebroadcast behavior, any node that announces a tx more than once -> the tx is from their wallet

19:09

<amiti> sishir: you mentioned dust attack, can you describe the attack?

19:09

<jnewbery> sishir: A dust attack is something different. It doesn't require knowing the target's network address

19:09

<glozow> ok cool! let's dive into the PR. How does the rebroadcast module decide which transactions to rebroadcast

19:09

<glozow> (`TxRebroadcastHandler::GetRebroadcastTransactions()`)?

19:10

<_0x0ff> It rebrodcasts tx that are: older than 30min, txfee > m_cached_fee_rate (calcualted via BlockAssembler), hasnt been rebroadcasted >= MAX_REBROADCAST_COUNT (6) and wasn't rebroadcasted in the last MIN_REATTEMPT_INTERVAL (4h).

19:10

<glozow> Code here: https://github.com/bitcoin/bitcoin/pull/21061/files#diff-7dff50848db96bdb8edffc4d21daeca6d9050ec0e67d96072780ea5751e7df06R33

19:10

<svav> A dusting attack is an attack in which a trace amount of cryptocurrency, called dust, is sent to a large number of wallet addresses with the purpose of "un-masking" or de-anonymizing the addresses. Dusting attacks are tactics utilized by both criminals and law enforcement agencies.

19:10

<glozow> _0x0ff: yes! very prepared :D

19:10

<sishir> Yessir! Dust attack is when attacker sends some btc (dusts) to various addresses and observes the wallet rebroadcasting behavior

19:11

<glozow> is it possible for `GetRebroadcastTransactions` to return 0 transactions?

19:11

<_0x0ff> glozow: hehe, i try ;P

19:11

<glozow> also, is it possible for `GetRebroadcastTransactions` to return more than a block's worth of transactions?

19:11

<_0x0ff> it is i possible to return 0 txs from what I gather, eg when mempool is empty

19:12

<glozow> not just when the mempool is empty! :)

19:12

<_0x0ff> but it's not possible to return then that a more txs that would fit the block, i think it only returns 3/4th of txs that fit the block

19:12

<b10c> more than a block's worth is not possible

19:12

<glozow> _0x0ff: b10c: correct, it would never return more than 3/4 of the maximum block weight

19:13

<_0x0ff> what is the other case that would return 0 txs?

19:13

<glozow> it could return fewer though, if the mempool just doesn't have many transactions that fit the criteria _0x0ff mentioned

19:13

<glozow> the filters are applied within the assembler

19:14

<glozow> does that make sense?

19:14

<_0x0ff> yup

19:14

<glozow> coolio

19:14

<cls> yes, nice explaination

19:14

<glozow> Moving on: In what scenarios would a miner include different transactions from our

19:14

<glozow> `BlockAssembler`? More specifically, when might the miner exclude a

19:14

<glozow> transaction, and when might it include a transaction yours doesn't?

19:14

<glozow> I can think of 3 in each category :)

19:15

<b10c> 1. a transaction didn't propagte to us or the miner yet

19:15

<_0x0ff> If the scenario when miner prioritizes different transactions from ours.

19:15

<marqusat> When a given tx does not reach a miner before they start mining the block or when they censor some transactions.

19:15

<b10c> 2. the miner manually prioritized the transaction

19:15

<b10c> 3. we and the miner have a conflicting transaction in our mempools

19:15

<_0x0ff> miner could also censor a tx

19:15

<b10c> 4. one party has a RBF replacement transaction which the other party doesn't have yet (similar to 1. and 3.)

19:15

<b10c> 5. the miner mines an emtpy block

19:15

<b10c> 6. censorship

19:15

<glozow> yaaaas all good answers

19:16

<cls> minor does not prioritize due to low transaction fee

19:16

<_0x0ff> good answers b10c :)

19:17

<amiti> here's a scenario where the filters wouldn't return any txns to rebroadcast: at time 0, the fee rate cache runs, identifies min fee rate. at time 1 a block comes in and picks up all our mempool txns above this fee rate. when we go to connect the tip, we don't have any remaining txns above the calculated min fee rate.

100

19:17

<b10c> amiti: good point

101

19:17

<_0x0ff> ha, good one

102

19:18

<glozow> amiti: right. and all the high-fee transactions that might have arrived in the meantime would not meet the 30 minute recency filter

103

19:18

<amiti> glozow: yeah, good point :)

104

19:19

<glozow> Ok! so what does the rebroadcast attempt tracker do?

105

19:20

<sishir> keeps track of the # of rebroadcast attempt

106

19:20

<_0x0ff> it tracks how many times we've rebroadcasted a tx and what was the last time we rebrodcasted it

107

19:20

<svav> tracks how many times and how recently we’ve attempted to rebroadcast a transaction so that we don’t spam the network with re-announcements.

108

19:20

<glozow> awesome, yes, svav: nice wording

109

19:21

<glozow> And Why might we want to keep a transaction in our rebroadcast attempt tracker even after removing it from our mempool?

110

19:21

<_0x0ff> and it prevents that network doesnt ddos itself with rebroadcasts

111

19:21

<glozow> Hint: what happens if we expire a transaction from our mempool and then our peer rebroadcasts it to us? When might this happen?

112

19:22

<_0x0ff> no clue about this one

113

19:22

<glozow> This part was really confusing for me - feel free to guess and ask questions

114

19:22

<sishir> I thought we remove them

115

19:23

<_0x0ff> well, i dont see a reason why deal with removing the tx given it will get expired or removed (when 500 limit is reached)

116

19:23

<glozow> mempool expiry is 2 weeks, while the attempt tracker expiry is ~3 months. why aren't they the same? -> there must be some reasons why we'd keep a tx in the attempt tracker after the mempool has forgotten about them

117

19:23

<cls> Each node may have a different state in a decentralized network

118

19:23

<_0x0ff> oh, if fees get high, and some txs get removed from mempool

119

19:24

<glozow> _0x0ff: after you remove from mempool, what happens if i rebroadcast to you?

120

19:24

<_0x0ff> hm but no, the BlockAssembler only gets txs from mempool

121

19:24

<_0x0ff> it will be added back to mempool

122

19:24

<glozow> (i still have the tx for whatever reason)

123

19:24

<glozow> sure, it gets added back to mempool

124

19:24

<svav> A peer might have a lower minimum fee rate that us, so they won't exclude it like us

125

19:24

<glozow> is it possible this transaction will _never_ get mined?

126

19:24

<glozow> can it be consensus-invalid? or policy-invalid?

127

19:25

<glozow> beyond just fees

128

19:25

<larryruane_> no because in that case it wouldn't have entered the mempool in the first place

129

19:25

<glozow> larryruane_: is it possible that the rest of the network has policy rules that we don't know about?

130

19:26

<glozow> let's say we're version 22 nodes with rebroadcast implemented

131

19:26

<b10c> not consensus invalid - but policy-invalid can happen if we don't support e.g. a softfork

132

19:26

<glozow> version 24 nodes have a new policy for version 2 witnesses, for example

133

19:26

<b10c> wait can it be consensus invalid?

134

19:26

<glozow> b10c: are there nodes right now that don't know about some consensus rules? :)

135

19:27

<b10c> sure, I there are nodes that don't know about e.g SegWit

136

19:27

<glozow> b10c: exactly

137

19:27

<b10c> I think e.g. forkmonitor runs a 0.10.x Bitcoin Core node

138

19:28

<glozow> heh. so, what happens if there are nodes rebroadcasting transactions that don't meet new consensus rules?

139

19:30

<b10c> hm

140

19:30

<b10c> not sure

141

19:30

<_0x0ff> same, no idea

142

19:30

<jnewbery> let's try to figure it out!

143

19:30

<glozow> b10c: what will updated nodes do? (the ones that know about the new consensus rules)

144

19:30

<glozow> and _0x0ff

145

19:30

<b10c> reject the transaction

146

19:31

<glozow> b10c: correct

147

19:31

<glozow> and what will old nodes do?

148

19:31

<_0x0ff> accept it, and keep on rebroadcasting the tx

149

19:31

<glozow> _0x0ff: correct!

150

19:31

<glozow> so what happens if 2 of these old nodes are connected to each other?

151

19:32

<_0x0ff> they will keep sending the tx between each other so it will never expire (until expiry conditions are met)

152

19:32

<glozow> _0x0ff: correct

153

19:32

<b10c> need to be more than 2 nodes with the current filter, right?

154

19:32

<glozow> let's say the tx is removed from the rebroadcast attempt tracker as soon as it expires from mempool

155

19:32

<b10c> 4h * 4 attempts < 14 days

156

19:33

<glozow> will these 2 nodes ever forget about the tx?

157

19:33

<sipa_> the last time a consensus rule change was introduced that changed something that wasn't already very widespread nonstandard was BIP113 i believe

158

19:33

<sipa_> (just to give some context)

159

19:33

<glozow> sipa_: right. i think this applies to new policy changes as well, though

160

19:33

<sipa_> glozow: it does

161

19:34

<sipa_> though i'm not sure when the last time was that policy was restricted

162

19:34

<b10c> fwiw: BIP 113: Median time-past as endpoint for lock-time calculations https://github.com/bitcoin/bips/blob/master/bip-0113.mediawiki

163

19:36

<glozow> soooo, should we remove a tx from rebroadcast attempt tracker as soon as we remove it from mempool?

164

19:37

<b10c> based on your question I don't think the 2 nodes would ever forget about the tx, but tbh my train of though got a bit lost

165

19:37

<sishir> wait so the nodes will not forget about the tx?

166

19:37

<sishir> im a lil confused

167

19:37

<b10c> glozow: no

168

19:37

<svav> glozow: no

169

19:37

<_0x0ff> no, we shouldnt remove it - so the tx woudl get expired faster

170

19:37

<b10c> never* ^

171

19:38

<glozow> b10c: svav: _0x0ff: correcto

172

19:38

<sishir> isn't the tx already confirmed, mined and in the blockchain? So, we still want to keep a copy of it in rebroadcast attempt?

173

19:39

<jnewbery> sishir: imagine both node A and node B have not upgraded and both consider the tx valid. If there's no way to prevent rebroadcasting, they'll just continue to rebroadcast the tx to each other.

174

19:39

<lightlike> though if you send >500 of these txes, they would still never forget even with the tracker, so I'm thinking the tracker only helps when this unintentional, not in an attack case or does it?

175

19:39

<jnewbery> sishir: this is only for unconfirmed transactions.

176

19:39

<_0x0ff> besides the the limit (how old tx we keep) in m_attempt_tracker there's also a size limit of 500

177

19:39

<sishir> Ahhh i see

178

19:40

<svav> You need to keep it in your rebroadcast attempt tracker so you don't rebroadcast it too many times

179

19:40

<glozow> lightlike: yeah, but i think you could get old nodes to keep talking about newly invalid transactions regardless

180

19:41

<glozow> svav: right, so let's see how the 3month expiry helps. let's say you expire a tx from mempool after 2 weeks, and you keep it in your rebroadcast attempt tracker. what happens if you see the tx again?

181

19:42

<glozow> (let's assume you don't exceed the 500 limit in this situation)

182

19:43

<svav> You would only rebroadcast it a maximum of 6 times within the 3 months

183

19:44

<_0x0ff> it will folow the same rules as it did before it got removed, which means we might not rebroadcast it immidiatelly after receiving it (if conditions for rebrodcasting arent met)

184

19:44

<glozow> svav: yup! so assuming the 2 old nodes keep it in their rebroadcast attempt trackers for 3 months, will they eventually forget about the tx?

185

19:45

<_0x0ff> yes :)

186

19:45

<glozow> _0x0ff: :)

187

19:45

<svav> yes when they have each rebroadcasted it 6 times

188

19:45

<glozow> svav: correct, after 6 times each

189

19:46

<glozow> so this helps, as long as they don't reach the 500 maximum

190

19:46

<glozow> amiti: have you considered increasing the limit? or keeping a separate tracker for expired-from-mempool transactions?

191

19:47

<amiti> glozow: yeah the limit is slightly arbitrary right now, just to pick a starting point. I think the most relevant will be observing the mechanism out in the wild & seeing if this limit is useful

192

19:47

<glozow> hopefully we feel comfortable moving to the next question? In general, when should we remove transactions from our rebroadcast attempt tracker?

193

19:47

<_0x0ff> i also saw a comment about persisting m_attempt_tracker to disk - do we think that's worthy to have?

194

19:47

<glozow> amiti: makes sense to me

195

19:47

<amiti> also, if the network is working as expected, we shouldn't be rebroadcasting txns heavily

196

19:48

<b10c> amiti: especially if many nodes rebroadcast with this patch

197

19:48

<amiti> =P

198

19:50

<jnewbery> another potential change could be to move the txids into a rolling filter if they reach MAX_REBROADCAST_COUNT, since at that point we only need a test for inclusion

199

19:50

<sishir> I conflicting tx in the block & tx that gets taken out of mempool cause of RBF

200

19:50

<glozow> sishir: yeah, those are good ones

201

19:50

<cls> might be worth looking a dynamic algorithms such as TCP retransmission which slowly degrades over time

202

19:51

<glozow> jnewbery: ooooooh

203

19:51

<glozow> or a cuckoo cache 🐦

204

19:52

<amiti> jnewbery: are you suggesting replace attempt tracker with rolling bloom filter? or having an additional?

205

19:52

<glozow> anyone else have ideas for when we should remove from rebroadcast attempt tracker?

206

19:52

<amiti> cls: yeah I considered that sort of design, but it feels overkill for the use case

207

19:52

<jnewbery> potentially having an additional, but I'm just throwing an idea out. It might not be good!

208

19:52

<svav> when it's confirmed?

209

19:53

<amiti> jnewbery: gotcha :)

210

19:53

<glozow> svav: yep! that's a big one

211

19:53

<svav> when it expires?

212

19:53

<cls> amiti: totally makes sense

213

19:53

<glozow> are there any other cases, beyond seeing a conflict in a block, where the tx is guaranteed to be invalid?

214

19:54

<glozow> svav: expires from where?

215

19:54

<sipa_> glozow: did you know there is an awesome way of combing cuckoo tables with (tolling) bloom filters? :)

216

19:54

<sipa_> *fombining

217

19:54

<glozow> glozow: whaaaa?!

218

19:54

<glozow> sipa_*

219

19:54

<sipa_> **combining

220

19:54

<glozow> HAH i tagged myself 😂

221

19:54

<sipa_> look up cuckoo filter

222

19:55

<sishir> gotta head out but thank you glozow. learned a lot

223

19:56

<sipa_> i did some work on creating an efficient rolling cuckoo filter, but put it aside with a bit higher priority things

224

19:56

<glozow> sishir: thanks for coming!

225

19:56

<glozow> sipa_ greatest crossover event

226

19:57

<glozow> ok I think we have time to do part of the last question: How is the estimated minimum feerate for inclusion in a block, `m_cached_fee_rate`, calculated?

227

19:57

<_0x0ff> It uses `BlockAssembler::minTxFeeRate()` which calculates a min fee that would still be included in the next mined block. This approach is better because it calculates fees based on the future mintxfee and not the past.

228

19:57

<svav> Is it something to do with MAX_ENTRY_AGE???

229

19:58

<glozow> _0x0ff: correct, assemble a block and get the min fee

230

19:58

<glozow> when do we do this?

231

19:59

<glozow> svav: er, i don't think?