Only allow getdata of recently announced invs (p2p)

Jul 8, 2020

https://github.com/bitcoin/bitcoin/pull/19109

Host: amitiuttarwar - PR author: sipa

The PR branch HEAD was 2da7ee37b at the time of this review club meeting.

Notes

If a spy is able to identify which node initially broadcast a transaction, there’s a high probability that that node is the source wallet for the transaction.
To avoid that privacy leak, we try to be intentional about how we relay and request transactions. We don’t want to reveal the exact contents of our mempool or the precise timing when we received a transaction.
PR 18861 improved transaction-origin privacy. The idea is that if we haven’t yet announced a transaction to a peer, we shouldn’t fulfill any GETDATA requests for that transaction from that peer. The implementation for that PR checks the list of transactions we are about to announce to the peer (setInventoryTxToSend), and if it finds the transaction that the peer has requested, then responds with a NOTFOUND instead of with the transaction. While this helps in many cases, it is an imperfect heuristic. Think about why that is (see question 3 below).
This week’s PR further reduces the possible attack surface. It introduces a per-peer rolling bloom filter (m_recently_announced_invs) to track which transactions were recently announced to the peer. When the peer requests a transaction, we check the filter before fulfilling the request and relaying the transaction.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? You’re always encouraged to put your PR review on GitHub, even after it has been merged.
What are possible techniques for an adversary to probe a node for a particular mempool transaction? What changes with this PR?
What were drawbacks of using setInventoryTxToSend as a heuristic?
What is the UNCONDITIONAL_RELAY_DELAY? How is this delay constant being used when deciding whether or not to fulfill a request?
What are some relevant considerations when reviewing/implementing a bloom filter? What choices does this patch make?
After these changes, can you think of other ways a spy could probe for a mempool transaction?

Meeting Log

17:00

<amiti> #startmeeting

17:00

<amiti> hi

17:00

<nehan> hi

17:00

<benthecarman> hi

17:00

<troygiorshev> hi

17:00

<willcl_ark> hi

17:00

<andrewtoth> hi

17:00

<michaelfolkson> hi

17:00

<jnewbery> hi

17:00

<amiti> hi everyone! welcome to this weeks review club. today, we'll chat about transaction privacy on the network

17:01

<overall> hi

17:01

<amiti> notes and questions in the normal place: https://bitcoincore.reviews/19109.html

17:01

<amiti> who has gotten a chance to look at the proposed changes?

17:01

<amiti> (y/n)

17:01

<jnewbery> y

17:02

<michaelfolkson> y

17:02

<troygiorshev> y

17:02

<nehan> y

17:02

<willcl_ark> n (sorry!)

17:02

<benthecarman> y

17:03

<amiti> cool! let's get started with the questions

17:03

<amiti> What are possible techniques for an adversary to probe a node for a particular mempool transaction? What changes with this PR?

17:04

<andrewtoth> is sending a GETDATA the only real way?

17:04

<michaelfolkson> So this is Gleb's comment. Attacker connects to us right after we put the transaction in MapRelay

17:05

<amiti> andrewtoth: the only real way to solicit a TX response?

17:05

<andrewtoth> to try and probe a node for a particular mempool transaction?

17:05

<felixweis_k> you could give it a transaction that spends a certain input

17:06

<andrewtoth> ahh felixweis_k interesting

17:06

<felixweis_k> if it requests the input tx, it didn't have it

17:06

<amiti> michaelfolkson: yes, could you break that down further? we could also go into this more in the next question

17:07

<amiti> felixweis_k: yeah! using children to observe request patterns can help a spy deduce tx presence

17:07

<michaelfolkson> So my (limited) understanding is that the previous PR fixed a lot of the premature announcement logic. And so now we are working out what still needs to be fixed

17:08

<michaelfolkson> #18861 (previous PR, now merged)

17:08

<amiti> alright, lets jump into the next question and discuss the previous PR (18861) and how this PR (19109) addresses a leak

17:08

<amiti> from the notes: PR 18861 improved transaction-origin privacy. The idea is that if we haven’t yet announced a transaction to a peer, we shouldn’t fulfill any GETDATA requests for that transaction from that peer. The implementation for that PR checks the list of transactions we are about to announce to the peer (setInventoryTxToSend), and if it

17:08

<amiti> finds the transaction that the peer has requested, then responds with a NOTFOUND instead of with the transaction. While this helps in many cases, it is an imperfect heuristic.

17:08

<amiti> What were drawbacks of using setInventoryTxToSend as a heuristic?

17:08

<michaelfolkson> Sorry I think I misunderstood the first question :)

17:09

<overall> michaelfolkson premature or non-privacy preserving announcement logic?

17:09

<amiti> michaelfolkson: nah, the two questions overlap.

17:10

<jnewbery> sipa's original PR description answers this question...

17:11

<michaelfolkson> overall: Premature is non-privacy preserving. I don't understand distinction...

17:12

<jnewbery> so setInventoryTxToSend is a set of transactions that we haven't yet announced to that peer

17:12

<troygiorshev> The issue in sipa's original PR description is a "false negative" sort of problem, right? i.e. we'll tell a node that we don't have a transaction even when they know that we actually do?

17:12

<amiti> troygiorshev: that's one of the issues, pointed out in the convo with luke-jr & sipa

17:12

<amiti> here: https://github.com/bitcoin/bitcoin/pull/18861#discussion_r419695831

17:12

<overall> michaelfolkson just checking if there was another metric, besides privacy, that wasn't needed to be managed

17:13

<amiti> but from my understanding, it seems pretty unlikely

17:13

<jnewbery> we'll only add txids to a peer's setInventoryTxToSend if we're connected to that peer when we relay the transaction

17:13

<sipa> overall: well, memory usage - we could also just keep a list of all things we've ever announced to a peer, and only allow them to query exactly that... but that would easily run into memory problems

17:14

<jnewbery> so using the old heuristic, if a node connects to us and immediately asks for a transaction, it won't be in the setInventoryTxToSend, and we'll happily send the transaction

17:14

<michaelfolkson> overall: Yeah I think it is just privacy. Apart from privacy, announcing prematurely doesn't matter. Unless you literally don't want to be announcing it at all (not a concern)

17:15

<amiti> jnewbery: exactly. does this make sense to people?

17:15

<andrewtoth> jnewbery: makes sense thanks!

17:15

<sipa> michaelfolkson: it's not announcing; it is allowing requests

17:15

<troygiorshev> jnewbery amiti: it does now :)

17:15

<sipa> michaelfolkson: this is about not permitting requests of things that weren't announced

17:16

<sipa> announcement itself has a bunch of privacy protections already, since a few years ago, and they're pretty good

17:16

<sipa> this is making sure that it can't be bypassed by tx requests

17:17

<andrewtoth> still a little unclear on the second point though. how does using setInventoryTxToSend prevent some locally resubmitted invs from being requested?

17:19

<amiti> andrewtoth: the idea is if we are using setInventoryTxToSend to prevent fulfilling a txn request, if we put a txn on there for a _second_ time, then it would be valid for the peer to request (since we prev announced), but we would deny them the txn

17:19

<michaelfolkson> sipa: Not permitting the providing of what they request right? They are free to request whatever they want. Just don't want to give it to them

17:19

<andrewtoth> amiti: thanks!

17:19

<sipa> michaelfolkson: exactly

17:20

<sipa> andrewtoth: it's a pretty contrived scenario, see https://github.com/bitcoin/bitcoin/pull/18861#discussion_r419707909

17:20

<felixweis_k> whats a mental model to aproximate what consitutes an acceptable amount of per peer memory?

17:21

<luke-jr> amiti: it seems like it could result in efforts to relay a tx, making it harder to relay

17:21

<amiti> sipa: my understanding of your comment there was that you were identifying the precise steps / timing that would have to happen to not serve the txn, is that correct?

17:21

<amiti> (aka not as simple as resubmit via sendrawtransaction)

17:22

<sipa> amiti: yeah, because it also requires the old submission to have been expired from filterAlreadyKnown

17:22

<amiti> luke-jr: I don't follow

17:22

<andrewtoth> sipa: thanks. both these cases seem pretty unlikely, right? but probably better to not have any edge cases

17:22

<sipa> andrewtoth: probability is the wrong metric when thinking about attacks

17:23

<sipa> andrewtoth: the question is how costly it is for an attacker to make it happen - they won't let it to chance

17:23

<jnewbery> felixweiss_k: that's a really good question! I don't know if there are any hard numbers

17:23

<luke-jr> amiti: unless I'm confusing my own comment (it's been a while), it seems sendrawtransaction (ie, the user explicitly wants to relay it) could end with it not being relayed even if it would have otherwise been?

17:23

<andrewtoth> interesting

17:23

<amiti> sipa: 👍🏽, and you were calculating the most feasible / quick way to expire (using mempool requests)

17:24

<amiti> luke-jr: so, you're talking about your comment here- https://github.com/bitcoin/bitcoin/pull/18861#discussion_r419695831?

17:24

<sipa> andrewtoth: an analogy i like: a civilian engineer designs building resistant to natural causes (earthquakes, ...), and needs probability. a military engineer designs structures that are designed to protect against attacks; they don't go compute the probability that a missile will randomly hit it; they care about how hard it is for an attack that make that happen

17:25

<amiti> ok, so these are some of the identified drawbacks of using `setInventoryTxToSend` as a heuristic. what does PR 19109 implement as an alternative solution?

17:25

<andrewtoth> sipa: :)

17:25

<michaelfolkson> But probability still informs what should be prioritized to fix!

17:25

<sipa> michaelfolkson: how so?

17:26

<sipa> we care about probabilities of things _randomly_ going wrong absent attack scenarios of course

17:26

<michaelfolkson> Well if there are two possible attacks and one attack is more likely than the other, fix that one first

17:26

<sipa> michaelfolkson: no, you fix the cheapest to perform first

17:26

<sipa> attacks are not random

17:27

<michaelfolkson> I guess I am seeing the cost of the attack as informing how likely that attack is to happen

17:27

<amiti> felixweis_k: good question about acceptable amount of memory / peer, something I also wonder

17:27

<luke-jr> andrewtoth: if an attack is possible, it increases the possibility

17:28

<sipa> at a very level view, you could see it that way

17:28

<sipa> but it's misleading imo

17:29

<andrewtoth> makes sense

17:29

<troygiorshev> sipa: we would want to prioritize on the "product" of the "ease" of the attack and the "damage" of the attack, no?

17:29

<jnewbery> felixweiss_k: without having specific numbers, I think a good rule is that keeping sets of items per-peer isn't scalable, but keeping a probabilistic filter is

17:30

<sipa> i think we'd like to have some sort of upper bound of memory usage per peer

17:30

<jnewbery> and in many cases, a probabilistic filter is good enough for the specific function

17:30

<sipa> or, alternatively, actual acconting of memory, and a way to deprioritize/disconnect peer if we're using to much memory on behalf of a peer, and it gets tight

100

17:31

<jnewbery> sipa: I think deprioritizing based on memory usage would be difficult. How do you shed the memory? Deprioriritizing based on CPU/bandwidth usage seems much easier

101

17:32

<sipa> jnewbery: agree

102

17:32

<amiti> lets jump into this question: What are some relevant considerations when reviewing/implementing a bloom filter? What choices does this patch make?

103

17:32

<amiti> since we are talking about one of the considerations- memory usage

104

17:32

<jnewbery> (not saying they're exclusive - just that the latter is much more straightforward)

105

17:34

<a__jonas> amiti: Generally things you'd want to consider are the are number of items to keep track of and the acceptable false-positive rate

106

17:34

<amiti> a__jonas: yes!

107

17:35

<amiti> did you catch what values were chosen in this PR ?

108

17:35

<jnewbery> how are those parameters chosen?

109

17:35

<a__jonas> (3500 and 0.000001)

110

17:35

<troygiorshev> amiti: False positive rate is 0.000001

111

17:36

<amiti> a__jonas, troygiorshev: yup!

112

17:36

<andrewtoth> so number of items increases memory, false positive rate increases CPU time?

113

17:36

<amiti> let's dig into jnewbery's question- how were these numbers chosen? so as reviewers, we can evaluate that it makes sense

114

17:37

<sipa> andrewtoth: better false positive rate also increases memory usage

115

17:37

<sipa> a rolling bloom filter has around -log2(fprate)*3/log(2) bits usage per tracked item

116

17:39

<jnewbery> 'rolling bloom filter' is perhaps a bit of a misleading name because you might expect that you can remove items

117

17:39

<amiti> if anyone is interested, I found this site useful to wrap my head around how bloom filters work: https://llimllib.github.io/bloomfilter-tutorial/

118

17:39

<jnewbery> in fact, it's implemented as 3 separate bloom filters, each with capacity 1/2 of the desired capacity

119

17:39

<andrewtoth> amiti: cool site!

120

17:40

<amiti> (props to aj for sharing with me)

121

17:40

<felixweis_k> thanks, amiti!

122

17:40

<sipa> jnewbery: in general you can't delete things from bloom filters

123

17:40

<sipa> there are generalizations that can, but they need more memory

124

17:40

<jnewbery> and we roll through the different filters (ie put items into the first filter until it's at capacity, then the second, then the third, then delete the first filter and start filling it again)

125

17:40

<felixweis_k> yeah rolling bloom filters are cool i've been thinking recently about something like that. didn't know we had them in core

126

17:40

<sipa> jnewbery: exactly

127

17:41

<jnewbery> so that's where the *3 in sipa's memory usage comes from

128

17:41

<jnewbery> because we actually have 3 bloom filters

129

17:41

<jnewbery> (I think)

130

17:41

<sipa> correct

131

17:41

<sipa> though it's not the entire picture

132

17:42

<amiti> jnewbery: oh cool, I was wondering how it worked but didn't get the chance to dig in yet. thanks!

133

17:42

<sipa> we also get a factor 2 gain from having 2 active generations (each of the 3 generations is only half the window size)

134

17:42

<felixweis_k> when you add, you add to the latest 2, when you check you check for OR in any of the 3? then a counter as to when advance the "ring buffer"?

135

17:42

<sipa> but also a factor 2 again because we need 2 bits per entry (which can store a number from 0 to 3... 0 for unused; 1-3 for each of the 3 generations)

136

17:42

<felixweis_k> sorry im just going to read the code

137

17:43

<jnewbery> felixweiss_k: you only add to the latest generation I believe. And yes, there's a counter to determine which generation we should be adding to

138

17:43

<michaelfolkson> What is AJ doing re running the patch to find premature requests?

139

17:44

<michaelfolkson> This is to explore when premature requests still happen when someone is running Core and is not malicious?

140

17:44

<jnewbery> felixweis_k: oops, I've been inserting and extra s in your name!

141

17:45

<sipa> michaelfolkson: indeed

142

17:45

<sipa> to see if our logic isn't too strict in what it permits

143

17:46

<amiti> 15 minutes left

144

17:46

<michaelfolkson> You want to allow a certain number of premature requests?! Don't get it

145

17:46

<sipa> michaelfolkson: no?

146

17:46

<michaelfolkson> Surely better if there are zero premature requests unless doing testing

147

17:46

<sipa> michaelfolkson: yes

148

17:47

<sipa> i'm confused by your conclusion :)

149

17:47

<felixweis_k> no worries, i've written new berry probably too in the past ;-)

150

17:47

<michaelfolkson> The "logic isn't too strict" comment. Surely we want extreme strictness in this case (unless doing testing)

151

17:47

<sipa> michaelfolkson: but it's an approximation

152

17:48

<troygiorshev> michaelfolkson: we're seeing false positives, right?

153

17:48

<sipa> we don't know exactly what all legitimate (and safe) behaviors are

154

17:48

<sipa> so we want to know if this policy we have implemented isn't going to accidentally affect legitimate use

155

17:49

<sipa> because we might have forgotten some protocol interaction where someone ends up legitimately requesting something that the filter approach disallows

156

17:49

<sipa> aj's testing is to figure this by running it in the wild, and seeing if there are legitimately-looking requests that get blocked by it

157

17:49

<jnewbery> I have a question on my new favorite subject in Bitcoin Core (locking), inspired by neha's question in the previous PR: https://github.com/bitcoin/bitcoin/pull/18861#discussion_r451666992

158

17:50

<michaelfolkson> Ok makes sense, thanks. So some premature requests are legitimate. e.g. the example AJ found.

159

17:50

<sipa> michaelfolkson: well, they're not premature - we're just accidentally marking them as such

160

17:50

<jnewbery> is this new data structure threadsafe? If so, what synchronization method is used to guard it? Do you think it's in the right place?

161

17:51

<a__jonas> to answer your question amiti, I think these numbers could be adjusted (as AJ maths and Sipa's response suggests)

162

17:52

<overall> jnewbery nice question :)

163

17:52

<a__jonas> It looks like caching was a consideration (as noted in the comment on the assert on line 134)

164

17:52

<andrewtoth> jnewbery: looks like it's protected by cs_main

165

17:52

<jnewbery> overall: credit goes to neha for asking in the previous PR :)

166

17:53

<jnewbery> andrewtoth: yes, can you explain why/

167

17:53

<overall> nehan +1

168

17:53

<sipa> a__jonas: i don't think i've commented at all on those numbers or how they could be changed

169

17:54

<andrewtoth> i need to look closer at the code here :/

170

17:55

<amiti> a__jonas: are you talking about the convo early on in the PR?

171

17:55

<troygiorshev> sipa: a__jonas: this? https://github.com/bitcoin/bitcoin/pull/19109#issuecomment-636712875

172

17:55

<nehan> jnewbery: +1. Also, I don't see how cs_vRecv is taken out properly in the previous PR, but maybe that could be answered more quickly here.

173

17:55

<amiti> ah, I think that convo lead to introducing `UNCONDITIONAL_RELAY_DELAY`

174

17:55

<sipa> or maybe i'm just wrong! and clang didn't detect it

175

17:56

<amiti> lets chat about it in the last few minutes here... How is this delay constant being used when deciding whether or not to fulfill a request?

176

17:56

<jnewbery> andrewtoth: the answer is that it m_recently_announced_invs is added to CNode, which is guarded by cs_main.

177

17:56

<sipa> nehan: i believe i'm wrong!

178

17:56

<nehan> sipa: where does net lock cs_vRecv?

179

17:56

<nehan> sipa: yeah ok good i'm not going crazy :)

180

17:56

<a__jonas> troygiorshev, yes - that and AJ's previous comment

181

17:56

<jnewbery> sipa: wrong about what?

182

17:56

<sipa> jnewbery: about claiming that net grabs cs_vRecv

183

17:57

<nehan> jnewbery: vRecvGetData isn't protected by cs_vRecv. But as you both pointed out it's only accessed in one thread. That seems worth documenting to me though.

184

17:57

<overall> 3 minutes remaining

185

17:58

<sipa> same with orphan_work_set

186

17:58

<jnewbery> nehan: definitely document it at the very least

187

17:58

<sipa> agree

188

17:58

<nehan> ok

189

17:58

<sipa> or fix it

190

17:59

<jnewbery> vRecvGetData and orphan_work_set are similar - they're both work queues for an individual peer. As such they're only pushed to/popped from in ProcessMessages

191

17:59

<nehan> sipa: ok!

192

17:59

<amiti> a__jonas, troygiorshev: that conversation calculated the number of transactions in 15 minutes, because after 15 minutes is the time set by mapRelay. after that review, sipa introduced `UNCONDITIONAL_RELAY_DELAY` as 2 mins. so now, if a peer requests a txn thats >2 minutes old, we will serve

193

17:59

<jnewbery> but agree it should be fixed!

194

17:59

<sipa> net_processing.cpp:1653:43: warning: reading variable 'vRecvGetData' requires holding mutex 'pfrom.cs_vRecv' [-Wthread-safety-analysis] std::deque<CInv>::iterator it = pfrom.vRecvGetData.begin();

195

18:00

<sipa> i recompiled and just missed the warning, silly me

196

18:00

<amiti> but if the txn is < 2 minutes old, it must be in the `m_recently_announced_invs` filter

197

18:00

<amiti> okay! it looks like thats time

198

18:00

<sipa> and the filter is sized such that it can store as many transactions as we can possibly announce in 2 minutes

199

18:01

<amiti> thanks for coming all :)

200

18:01

<troygiorshev> thanks amiti!

201

18:01

<felixweis_k> thanks all, and amiti for hosting! very informative as always.

202

18:01

<sipa> thanks!

203

18:01

<andrewtoth> thanks amiti! and everyone else for the discussion

204

18:01

<overall> thanks amiti!

205

18:01

<nehan> thanks amiti!

206

18:01

<jnewbery> thanks amiti. Great discussion!

207

18:02

<a__jonas> Thanks amiti

208

18:02

<amiti> #endmeeting

209

18:02

<michaelfolkson> Thanks amiti! New IRC nick?

210

18:03

<sipa> jnewbery: so, you're right - the memory usage of the rolling bloom filter (which i'll completely unambiguously abbreviate to RBF) is indeed exactly equal to having 3 completely separate bloom filters

211

18:03

<sipa> and even operationally it's very similar to that

212

18:03

<amiti> hah, my normal irc client is down (I use irccloud) , hopefully it will be back soon

213

18:03

<felixweis_k> using irccloud too usually. lol

214

18:04

<troygiorshev> sipa: unambiguously :D

215

18:04

* sipa suggests ssh + vps + screen + irssi :)

216

18:04

<felixweis_k> maybe someone should create a decentralized IRC client

217

18:04

<overall> lol

218

18:04

<felixweis_k> (im kidding)

219

18:04

<sipa> jnewbery: the difference is that you'd need to access 3x as many memory locations

220

18:04

<overall> on a time chain

221

18:05

<jnewbery> sipa: so a 3x constant factor in checking for membership

222

18:05

<overall> sipa and connect irssi to freenet over tor? the sasl registration over clearnet bugs me :(

223

18:06

<sipa> overall: indeed

224

18:06

<a__jonas> thanks sipa

225

18:07

<overall> I understand that abuse / rate limiting demands that require sasl but still wish there was another way to handle it (surety bond)

226

18:07

<sipa> jnewbery: with a rolling cuckoo filter you only need 28ish bits per elememt (at 20 bits fprate), instead of 86ish

227

18:08

<jnewbery> if anyone wants to dig deeper on probabilistic filters, sipa left this rather tantalizing comment about _cuckoo_ filters on a different PR: https://github.com/bitcoin/bitcoin/pull/19219#issuecomment-652685715

228

18:08

<jnewbery> paper here: https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf

229

18:09

<troygiorshev> jnewbery: thx

230

18:09

<sipa> happy to do a review session about it, if/when it's done :)

231

18:09

<jnewbery> sipa: that'd be great!

232

18:09

<amiti> sipa: +1

233

18:15

<jnewbery> sipa: I haven't looked at the implementation recently, but do the 3 bloom filters use the same hash functions? If so, my comment ("3x constant factor in checking for membership") isn't correct because you'd only do the hashing once

234

18:15

<sipa> jnewbery: they dk

235

18:15

<sipa> they do

236

18:16

<jnewbery> ok, so checking 3x memory locations but that doesn't equate to 3x lookup times

237

18:16

<sipa> right; number of hashes would be identical across them