Cache responses to GETADDR to prevent topology leaks (p2p)

Jul 1, 2020

https://github.com/bitcoin/bitcoin/pull/18991

Host: naumenkogs - PR author: naumenkogs

The PR branch HEAD was 52d22a3c at the time of this review club meeting.

Notes

The motivation for PR 18991 is to make the ADDR/GETADDR address gossip protocol more private for the node providing addresses.
Nodes transmit three main objects over the p2p network: transactions, blocks and network addresses (e.g. 172.1.2.3). Today’s PR is concerned with network address gossiping.
There are 2 relevant p2p messages: GETADDR (the sender requests the receiver for network addresses along with some per-address metadata) and ADDR (the receiver responds with the requested data they want to share with the sender).
This GETADDR/ADDR protocol is currently used every time a node makes a new connection, after receiving the version message.
There are other ways of learning about potential peers, which are not relevant to this PR, but it’s worth being aware of them:
- Network addresses of the nodes are propagated in the network in an unsolicited way: from time to time, every node makes a self-announcement which is propagated across the network.
- Nodes can query predefined DNS seeds for a list of potential peers.
Every time a node hears about another node in the network, it adds/updates a record in its AddrMan (Address Manager). This is where all the data about nodes in the network is stored, and this is where a GETADDR receiver looks to construct an ADDR response to the requester.
It turns out that it’s possible for a spy node to easily scrape the full contents of any reachable node’s AddrMan. The spy just has to connect to a victim node multiple times and execute GETADDR. This scraped data can then be used to infer private information about the victim.
For example, a spy can monitor the victim’s AddrMan content in real time and figure out which peers a node is connected to. A spy can also compare the AddrMan content from two different connections (e.g. one identified by Tor address and one identified by IPv4) and figure out that it’s actually the same physical node.
This PR is a first step towards fixing these privacy issues. If we limit (cache) the leaked portion of AddrMan, these inference activities will become much harder. Caching in this context means that the ADDR response (which is only a small subset of a node’s AddrMan content) remains the same for every GETADDR call during (roughly) a day.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? You’re always encouraged to put your PR review on GitHub, even after it has been merged.
What is the importance of ADDR relay and specifically the GETADDR/ADDR protocol? What are the goals of this protocol? Which properties of ADDR relay are important?
Do you understand the examples of the attacks described above? Can you think of any other ways to exploit scraping AddrMan? If you find something really dangerous, consider discussing it first with another developer privately.
Do you think the attacks are severe? What are the potential consequences of making the topology (local or global) public?
Do you think the suggested PR sufficiently addresses the problem?
What are the side effects of the suggested solution?

Meeting Log

13:00

<jnewbery> #startmeeting

13:00

<jnewbery> hi

13:00

<emzy> hi

13:00

<gleb1010> hi

13:00

<troygiorshev> hi

13:00

<willcl_ark> hi

13:00

<pinheadmz> hi

13:00

<figs> hi

13:00

<amiti> hi

13:00

<jnewbery> Hi folks! Today we're going to be looking at a part of the p2p network that we haven't looked at before: address discovery and gossip.

13:01

<jnewbery> Or in other words: how a node learns about potential new peers to connect to.

13:01

<michaelfolkson> hi

13:01

<lightlike> hi

13:01

<jnewbery> Notes and questions in the normal place: https://bitcoincore.reviews/18991.html

13:01

<jnewbery> We're lucky to have the PR author hosting today. So without any more delay, I'll hand over to gleb1010.

13:01

<sipa> hi

13:01

<gleb1010> Hi again all! Hope we have a productive hour

13:01

<gleb1010> I saw some useful review comments from participants, but let’s check with everyone.

13:01

<gleb1010> Have you reviewed the PR? Y/n

13:01

<willcl_ark> y

13:01

<pinheadmz> y

13:01

<troygiorshev> y

13:01

<amiti> y

13:01

<lightlike> y

13:01

<jnewbery> y

13:02

<emzy> y/n

13:02

<gleb1010> That's impressive :)

13:02

<gleb1010> You are feel free to ask any questions at any time, but let’s start with the notes we posted on the website.

13:02

<gleb1010> Let's first discuss this

13:02

<gleb1010> What is the importance of ADDR relay and specifically the GETADDR/ADDR protocol?

13:03

<pinheadmz> gleb1010 its for network nodes to discover new peers

13:03

<emzy> Asking a peer for more other peers.

13:03

<andrewtoth> hi

13:03

<gleb1010> Right, so since Bitcoin operates over a peer-to-peer network, it's essential for everyone to know somebody else from the network

13:03

<pinheadmz> there are also DNS seeds, and I think a hard-coded list of peers (?) for bootstrapping

13:04

<nehan_> hi

13:04

<gleb1010> pinheadmz: right, there are several ways to learn about other nodes in the network

13:04

<emzy> even a DNS seed uses ADDR to find nodes.

13:05

<pinheadmz> emzy true but to get a list of those peers from the seeder its actually DNS protocol

13:05

<willcl_ark> hmmmm, just thinking out loud - does this PR affect DNS nodes' responses at all? (I've not looked at the dnsseeder code)

13:05

<gleb1010> DNS internals always felt a bit obfuscated to me, so if you're somewhat confused, you're not alone :)

13:06

<gleb1010> Nope, I don't think they affect those responses, but maybe sipa can confirm?

13:06

<michaelfolkson> How were those hard-coded peers chosen? Are they regularly checked for uptime etc?

13:06

<willcl_ark> seems like https://github.com/sipa/bitcoin-seeder doesn't query Core, so I guess not!

13:06

<gleb1010> Pieter runs one out of 6 DNS seeds we have, so that the very new nodes know how to learn about each other

13:06

<gleb1010> michaelfolkson: I believe Wladimir maintains the list.

13:06

<sipa> do does emzy

13:07

<gleb1010> Well, hard-coded peers are updated within regular release process.

13:07

<emzy> willcl_ark: it connects direcly via p2p to a node to ask. No local bitcoind is running.

13:07

<sipa> michaelfolkson: see https://github.com/bitcoin/bitcoin/blob/master/doc/release-process.md

13:07

<gleb1010> So I think if you look hard enough you can find a PR with that list

13:07

<willcl_ark> emzy: thanks,

13:07

<gleb1010> Okay this one is really interesting.

13:07

<gleb1010> Which properties of ADDR relay are important?

13:08

<gleb1010> We know that blocks should be relayed faster, we know that transactions should be more or less unlinkable to IP

13:08

<michaelfolkson> sipa: Thanks

13:08

<willcl_ark> Ideally, you want to get a diverse list of peers

13:08

<pinheadmz> and you want peers with good uptime or at least "seen" recently

13:08

<jnewbery> emzy: DNS seeds actually use DNS

13:08

<gleb1010> Right, one way to look at it is to consider an individual node's AddrMan, not cross-network relay.

13:09

<jnewbery> here: https://github.com/bitcoin/bitcoin/blob/ffa70801dab7fa85c24fd5d19ca998e0910238d5/src/net.cpp#L1681-L1682

13:09

<gleb1010> Every node wants to have a good up-to-date list of nodes in their AddrMan.

13:09

<jnewbery> if DNS fails, we connect using P2P and send a GETADDR (here: https://github.com/bitcoin/bitcoin/blob/ffa70801dab7fa85c24fd5d19ca998e0910238d5/src/net.cpp#L1691-L1694)

13:10

<emzy> jnewbery: it answers via DSN but it has to crawl the nodes via p2p

13:10

<gleb1010> This is where we ideally should talk about unsolicited self-announcements which propagate ADDRs across the network, but let's leave that for homework

13:10

<gleb1010> Let's see what's this parallel discussion is about :)

13:10

<lightlike> so this change might influence DNS seed indirectly. If the DNS seeder regularly scrapes to get new addresses (and deliver them via DNS) and it gets cached responses in the future, it might introduce some kind of inertia to changes in the network.

13:10

<emzy> s/DSN/DNS/

13:11

<gleb1010> emzy is curious how DNS servers actually learn about the nodes they gonna feed to nodes querying them

13:11

<gleb1010> or maybe not curious but he makes a statement :)

13:11

<sipa> emzy runs a DNS seed

13:11

<willcl_ark> lightlike: but the cached responses will be different from each node

13:11

<jnewbery> emzy: I'm not sure what you mean by crawl the nodes, but it gets its list of potential peers by DNS

13:12

<lightlike> willcl_ark: true, so probably not noticeably

13:12

<sipa> jnewbery: the DNS seeder itself, not the client

13:12

<sipa> jnewbery: it has to get the list of good IP addresses it serves from somewhere

13:12

<gleb1010> This is true. If everything is cached, the records everywhere become a bit older (within 1 day)

13:12

<sipa> to do so, it crawls the network using the P2P protocol

13:12

<pinheadmz> anyone can do a quick experiement to query the DNS seed nodes: `dig seed.bitcoin.sipa.be`

13:12

<pinheadmz> returns a list of A records

13:13

<sipa> you can get IPv6 ones using dig -t AAAA seed.bitcoin.sipa.be

13:13

<troygiorshev> pinheadmz: thx

13:13

<emzy> tnx sipa, that was my point.

13:13

<gleb1010> Okay, so we can talk about the impact a bit more now

13:13

<willcl_ark> so I think this doesn't affect the DNS seeds

13:13

<gleb1010> How we should look at the threat of everyone gets a bit older AddrMan records?

13:14

<gleb1010> Like, let's say a node gets 1000 records, but 500 of them were updated since the last cache

13:14

<michaelfolkson> That is a threat? Or just an inefficiency?

13:14

<troygiorshev> will this make it more difficult for a new node to get a diverse set of peers?

13:14

<willcl_ark> less than 24h should not be too much of a problem (unless node churn is very bad)

13:14

<gleb1010> Right, so I remember a research paper from Till Neudecker showing that churn in our network is really low

13:15

<jnewbery> ah, I understand. Thanks

13:15

<emzy> gleb1010: It will take longer for new nodes to be discoverd by others.

13:15

<jnewbery> yes. I thought emzy was talking about 'DNS seed connections' rather than 'DNS seeders'. All clear now!

13:15

<lightlike> troygiorshev: it will probably introduce a delay until you get new inbound peers.

13:15

<sipa> fwiw, my dns seeder software will generally visit every node it knows about in the network every few hours

13:15

<willcl_ark> https://dsn.tm.kit.edu/bitcoin/ these guys measure (some) churn

13:15

<sipa> so this will impact it, but not much i think, given the low churn

13:15

<gleb1010> That's a great link above, it has so many things, save and take a look later :)

100

13:16

<gleb1010> sipa: So the point is, some of the records might have an older timestamps, but they will probably be still alive?

101

13:16

<gleb1010> And they won't be garbage in our AddrMan

102

13:17

<gleb1010> Okay, let's move forward.

103

13:17

<pinheadmz> gleb1010 are the timestamps in addrman updated with every message from the peer? or just once on connection?

104

13:18

<gleb1010> pinheadmz: it's a bit complicated there, afaik it's different for inbound/outbounds

105

13:18

<gleb1010> For one of them it's every message, for another it's when they connect. We also update based on unsolicited ADDRs...

106

13:19

<gleb1010> But yeah, the expectations of these timestamps are a bit vague right now, probably that's an area of future improvements.

107

13:19

<emzy> If an attacker has like 1000 ipv4 addresses this PR would not mitigate the attack?

108

13:20

<gleb1010> There is no *the attack* :P

109

13:20

<amiti> pinheadmz: I found the coinscope paper interesting bc it used the timestamps to figure out network topology (link: https://www.cs.umd.edu/projects/coinscope/coinscope.pdf)

110

13:20

<gleb1010> Yeah coinscope is great!

111

13:20

<troygiorshev> pinheadmz: fwiw if you feel like looking into the history, it was apparently changed in version 0.10.1

112

13:20

<gleb1010> Let's get to attacks.

113

13:20

<willcl_ark> What I am interested to know is, how does the previous (current) ADDR response give away your connected vs disconnected peers? Unless disclosing is a security risk OFC

114

13:20

<amiti> idk how much the code has changed since then, but gave me an idea of the complexity

115

13:20

<gleb1010> In the PR, I was a bit vague about the attacks, because it's always a bit sensitive.

116

13:21

<emzy> gleb1010: or privacy leak

117

13:21

<willcl_ark> Also an acceptable answer :)

118

13:21

<gleb1010> But the coinscope paper is one example of exploiting it to infer direct links between nodes.

119

13:21

<gleb1010> Another leak is to map Tor identity to ipv4 identity of the same node. If we scrape everything and compare all timestamps, it's very easy to tell it's the same entity.

120

13:21

<pinheadmz> we've talked about eclipse attacks in review club before as well - it could be used by an atacker to check on their success maybe

121

13:22

<gleb1010> Anybody got some time to think of any other vectors?

122

13:22

<michaelfolkson> You mentioned linking Tor and IP addresses

123

13:22

<amiti> I was wondering how a spy could go from knowing addrman to figuring out the network topology, the coinscope paper uses the timestamps. is that the main way or are there others?

124

13:22

<gleb1010> pinheadmz: right, it's great to remember that most of researchers seem to feel that topology should be private/obscured for security reasons :)

125

13:22

<michaelfolkson> And then there's linking Bitcoin nodes to Lightning nodes ;)

126

13:23

<gleb1010> amiti: Great question! Timestamps is what I was always thinking too.

127

13:23

<willcl_ark> So from that paper it seems "recent and unique" timestamps (might) give it away

128

13:23

<gleb1010> willcl_ark can you elaborate?

129

13:23

<sipa> i would expect there are ways to introduce randomly generated IPs in the network, broadcast them on certain incoming connections, and then observing which other peers know about them

130

13:24

<willcl_ark> I'm not sure, but perhaps if we hear of a peer from another peer, we have the same timestamp, but if we connect ourselves, we get a unique timestamp

131

13:24

<willcl_ark> so if you see the same timestamp from a few peers we can't tell, but you give me a unique and recent timestamp, we can infer you are connected directly to them

132

13:24

<troygiorshev> sipa: ah like putting rubber ducks in a river and seeing where they end up

133

13:24

<gleb1010> My first idea was to just track the "self-announcements"

134

13:25

<gleb1010> Nodes sometimes announce themselves to their direct peers

135

13:25

<sipa> michaelfolkson: lightning node have an explicit public identity, so i'm not sure what there is to link

136

13:25

<gleb1010> And those timestamps will be very distinct

137

13:25

<gleb1010> sipa: link lightning node to a bitcoin ip or onion leads to issues, see our time-dilation attacks paper :)

138

13:25

<michaelfolkson> But you don't always know the Bitcoin full node that the Lightning node is using

139

13:26

<sipa> ok

140

13:26

<gleb1010> Alright, there are couple ideas to exploit this above

141

13:26

<gleb1010> Perhaps someone comes up with something even cooler than we can think of

142

13:26

<gleb1010> And maybe not much fixed by caching :)

143

13:26

<pinheadmz> gleb1010 this cache applies to all your peers?

144

13:26

<pinheadmz> so within 24 hours, every node that GETADDR from the same network get the same response?

145

13:26

<sipa> oh no, a netsplit

146

13:27

<pinheadmz> otherwise you could sybil a node and just get all the addrs anyway

147

13:27

<gleb1010> pinheadmz: yes, same cache per the network of request originator

148

13:27

<willcl_ark> i think the cache would be different for each peer, like the mempool

149

13:27

<gleb1010> Except white-listed requestor

150

13:27

<gleb1010> sipa: huh?

151

13:27

<sipa> gleb1010: on IRC, now, in this channel

152

13:28

<sipa> we lost jnewbery and others

153

13:28

<gleb1010> ah i see :)

154

13:28

<gleb1010> this happens from time to time

155

13:28

<gleb1010> My handbook doesn't have instructions for this...

156

13:28

<michaelfolkson> A network partition

157

13:28

<sipa> haha

158

13:28

<gleb1010> I think that subnet is in good hands of jnewbery

159

13:28

<willcl_ark> hard forked them off

160

13:28

<gleb1010> okay so the discussion above was

161

13:29

<gleb1010> I suggest using a separate cache for different network origin, and someone above pointed out why it is useful

162

13:29

<gleb1010> Welcome back!

163

13:29

<gleb1010> Alright, let's try to merge back to the notes

164

13:30

<michaelfolkson> In the time you were away we eclipsed all your peers

165

13:30

<gleb1010> We sort of discussed the severity of the attacks

166

13:30

<gleb1010> Leaking the topology can eclipsing a node easier, also spying easier

167

13:30

<gleb1010> Also stealing funds from Lightning... many things

168

13:30

<gleb1010> Anything else?

169

13:31

<michaelfolkson> Privacy attacks

170

13:31

<willcl_ark> elcipse of lightning light clients (e.g. Neutrino) seems like a particularly severe one

171

13:31

<gleb1010> Well, part of the issue is that Neutrino's p2p stack is poor, so I'm not sure it's a good motivation for a Bitcoin Core change haha

172

13:31

<willcl_ark> but i guess it all comes back to the same base-layer eclipse

173

13:31

<willcl_ark> sure!

174

13:32

<gleb1010> Anyone looking for some little work should consider helping to mature Neutrino implementation

175

13:32

<gleb1010> With the experience we're getting while working on this stuff :P

176

13:33

<gleb1010> Yeah, so basically eclipse attacks are the worst probably (and netsplits, which is sort of an eclipse too but large scale)...

177

13:33

<gleb1010> You can do what you want with your victim, so we don't want that.

178

13:33

<gleb1010> Bringing us to the question... does this PR really help?

179

13:33

<pinheadmz> i guess if you got enough addrs from enough nodes you could uncover who the critical nodes are

180

13:33

<gleb1010> (to hide local and global topologies)

181

13:33

<pinheadmz> like, nodes with the most connections, potential attack points

182

13:34

<gleb1010> pinheadmz: Right, it would be easier to split the net if one kills the bridges, but hopefully our graph is random enough so that's not very helpful :)

183

13:34

<amiti> does netsplit = network partition?

184

13:34

<primal> not sure if it came across but I had the following comment during the netsplit

185

13:34

<primal> is the PR not shifting the attack strategy to gain topology information from an attacker being rate limited to the attacker spinning up a botnet?

186

13:34

<primal> I haven't worked through how peering dynamics and rate limiting ADDR will evolve, but ^^ is a gut-reaction check

187

13:34

<gleb1010> amiti: yeah

188

13:34

<willcl_ark> it appears that it might slow down the attack to the extent that peers get new connections etc. so that wouldn't be so effective

189

13:34

<nehan_> gleb1010: it slows the attacker down

190

13:34

<gleb1010> Right. This PR just makes an attacker spend way more time on learning what they want to learn.

191

13:34

<willcl_ark> at pretty minimal cost to "freshness" of ADDR responses

192

13:35

<troygiorshev> I'm worried that it simply delays learning the topology, but i'm not sure

193

13:35

<gleb1010> I never bothered to measure the exact attack delay we may introduce

194

13:35

<lightlike> primal: you can connect with as many bots that you want, all will get the same cached response for a while.

195

13:35

<gleb1010> Because I have other ideas on the way to improve the privacy :)

196

13:35

<gleb1010> Let's think of how we can improve this stuff even further?

197

13:35

<pinheadmz> is there a drawback where a new node with maxinbound 1000 wont actually get any new inbounds for up to 24 hours?

198

13:36

<primal> lightlike ahh ok so the cache reduces the set of info that we allow to leak out of our node

199

13:36

<gleb1010> Anything crazy creative works, let's discuss ideas

200

13:36

<michaelfolkson> Sometimes delays are all you can do troygiorshev. Make it unviable to do unless extremely targeted victim

201

13:36

<gleb1010> primal! Right. And also the indicators=timestamps are a little "outdated"

202

13:36

<willcl_ark> primal: I guess it's like "leak rate"

203

13:36

<gleb1010> pinheadmz: not sure I follow.

204

13:36

<primal> "topology disclosure rate" or something of the sort

205

13:37

<gleb1010> What this has to do with inbound limit?

206

13:37

<jnewbery> pinheadmz: it'll reduce how quickly other nodes learn about your address in the first 24 hours, but won't eliminate it entirely

207

13:37

<emzy> gleb1010: what about randomness in the last seen time?

208

13:37

<pinheadmz> jnewbery right, just a delay. and for inbounds its like "eh, your loss"

209

13:37

<jnewbery> because each node's cache is updated at a different time

210

13:37

<willcl_ark> pinheadmz: you should get 1000 different cached responses from all of those peers, so you should be fine

211

13:37

<primal> emzy: you don't want randomness, you want to chop the information at a certain bit

212

13:37

<gleb1010> emzy: this is interesting!

213

13:37

<jnewbery> also, be aware that there is another method of address gossipping, which is that each node will announce its own address to its peers every ~24 hours, and those peers will gossip that on to some of their peers. That method is unaffected by this PR.

214

13:37

<nehan_> the thing i'm trying to understand are the implications of serving stale timestamps. the timestamp logic is weird (as described in the coinscope paper, which might be out of date)

215

13:37

<thomasb06> is there a reverse mechanism in case of misusage: certain nodes able to overcome the privacy for example?

216

13:37

<gleb1010> wow this moves fast!

217

13:37

<emzy> primal: also vaid

218

13:37

<pinheadmz> willcl_ark i referring more to my own IP getting out to the network, if i have a lot of open slots to offer

219

13:37

<jnewbery> that there is another method of address gossipping, which is that each node will announce its own address to its peers every ~24 hours, and those peers will gossip that on to some of their peers. That method is unaffected by this PR.

220

13:38

<jnewbery> (https://github.com/bitcoin/bitcoin/blob/ffa70801dab7fa85c24fd5d19ca998e0910238d5/src/net_processing.cpp#L3896-L3899)

221

13:38

<jnewbery> (or more accurately: for each peer, once every 24 hours, we reannounce our address to that peer)

222

13:38

<gleb1010> jnewbery: I believe it's a bit less often than that, because there's a bloom filter in that announcement, but I have to double-check...

223

13:38

<willcl_ark> oh no! we lost him

224

13:38

⚡ sipa spawns a new gleb1010

225

13:39

<willcl_ark> :)

226

13:39

<pinheadmz> gleb1011 enters :-)

227

13:39

<michaelfolkson> I do wonder how easy/difficult it is to knock a node off the Bitcoin network

228

13:39

<gleb10101> Now I got forked, sorry.

229

13:39

<michaelfolkson> Because that's what eclipse attacks rely on right?

230

13:39

<gleb10101> michaelfolkson: good question! There are many ways to knock a node

231

13:39

<emzy> is there a problem for nodes that change evey 24h there ip address to be discoverd? Beause this is often the case by DSL/dial in connections.

232

13:40

<gleb10101> michaelfolkson: hopefully all those many ways are hard/expensive enough :)

233

13:40

<pinheadmz> jnewbery dont wanna veer too far off topic, but doesnt a node learn its own IP address from other peers?

234

13:40

<gleb10101> emzy: I actually didn't think about them. What would be the issue? They get less connections inbound?

235

13:41

<jnewbery> pinheadmz: yes, I believe that's a part of it, although it's not something I've looked at too closely

236

13:41

<pinheadmz> emzy this is whayt i was getting to. but actually most home-run nodes probably arent accepting inbounds anyway (firewall, etc)

237

13:41

<emzy> gleb10101: exactly. They will be second class nodes.

238

13:41

<primal> pinheadmz I don't understand the significance of a node learning it's own ip add from other peers. what am I missing?

239

13:41

<amiti> pinheadmz, jnewbery: oh I've totally seen code in connection logic that says "if its yourself, disconnect" 😛

240

13:42

<pinheadmz> primal bitcoind used to actually make a request to whats-my-ip.com or something

241

13:42

<gleb10101> emzy: fair enough, maybe you should think about that more and then come to the PR with the conclusions

242

13:42

<pinheadmz> i guess its hard for a process on your laptop to know what its internet IP truly is

243

13:42

<jnewbery> amiti: that's something slightly different, and is detected by including a random nonce in the VERSION message

244

13:42

<emzy> pinheadmz: I think this is the case.

245

13:42

<gleb10101> emzy: wondering if that's the scenario we're targeting

246

13:42

<gleb10101> Alright, I asked about the alternative solutions, I was about to share one :)

247

13:42

<emzy> gleb10101: good question.

248

13:42

<gleb10101> Just to throw this another idea, we don't have to discuss it.

249

13:43

<gleb10101> I wanna implement self-announcement on feeler connection with some probability.

250

13:43

<amiti> jnewbery: oh, interesting. ok thanks

251

13:43

<primal> pinheadmz are you saying that bc a node can learn its ip addr from peers that removes the need to communicate with other services?

252

13:43

<gleb10101> We do connect to some node in the network every 2 minutes just to see if they're alive, might as well ask them to relay our addr. This would obfuscate it even further

253

13:43

<pinheadmz> primal right or more sepciifcally centralized services that arent even bitcoin related...ill try to find the PR its an old one

254

13:43

<gleb10101> Maybe someone gets any ideas like this for future PR :)

255

13:44

<sipa> bitcoin core used to query some "find my ip" website in a long-gone past

256

13:44

<jnewbery> gleb10101: what do you mean by 'obfuscate' in this context?

257

13:44

<willcl_ark> sipa: sounds like a privacy nightmare :P

258

13:44

<sipa> willcl_ark: yes indeed

259

13:44

<gleb10101> jnewbery: Coinsope and my own idea initially was to 1. scrape AddrMan often 2. infer inbounds by new records/special timestamps

260

13:45

<pinheadmz> sipa right, and then that website turned into a real estate site or something, was bascially getting ddosed by bitcoin network :-P

261

13:45

<gleb10101> jnewbery: so now these new records/special timestamps will be not only at victim's direct peers and their peers, but also at random feelers

262

13:45

<gleb10101> Yeah, this is an interesting story how we moved from that website...

263

13:45

<willcl_ark> could nodes use a random offset for the timestamp when serving (or storing) ADDR

264

13:46

<willcl_ark> then nobody would ever have the same

265

13:46

<gleb10101> willcl_ark: this is what greg maxwell told me we already do when I discovered this issue a year ago haha

266

13:46

<gleb10101> But then I don't think we do randomize them

267

13:46

<gleb10101> So that was some phantom feature

268

13:46

<willcl_ark> :) But we should!

269

13:47

<gleb10101> So the idea is to randomize a timestamp on every ADDR sending

270

13:47

<gleb10101> This will help with some issues...

271

13:47

<pinheadmz> https://github.com/bitcoin/bitcoin/pull/3088 dont use 3rd party IP services

272

13:47

<gleb10101> willcl_ark tracking occurence of new records in AddrMan still would be possible

273

13:47

<emzy> the randomness may be the less invasive for the P2P network.

274

13:48

<amiti> fundamental question: what is the intended purpose of the ADDR timestamps? I saw logic that used this info to not relay old addrs. is that the main reason?

275

13:48

<gleb10101> amiti: yeah I believe so.

276

13:48

<primal> pinheadmz 845c86d128fb97d55d125e63653def38729bd2ed

277

13:49

<willcl_ark> gleb10101: hmmmm, interesting

278

13:49

<gleb10101> I believe every time we get an ADDR, we would deprioritize it if it's 1 week old

279

13:49

<primal> ah yeah you linked it

280

13:49

<gleb10101> Okay, we have 10 minutes left

281

13:49

<gleb10101> I was about to ask about side-effects, but we actually discussed them :)

282

13:49

<gleb10101> But someone can highlight a side-effect of their concern again

283

13:50

<gleb10101> Or just ask any other question?

284

13:50

<willcl_ark> I was wondering about setting ADDR as a default for whitebind

285

13:50

<jnewbery> There was a suggestion in https://github.com/bitcoin/bitcoin/pull/16442 to dynamically change the local service bits depending on whether the compact block filter index was built. It was argued that because it would only ever go from false to true, that would be ok.

286

13:50

<pinheadmz> is there any way an ADDR message cahced response could be used to identify a node? if you get the same response from nodes running on two IPs for example?

287

13:50

<willcl_ark> it doesn't really make much difference as you specify in config, but...

288

13:50

<nehan_> I asked earlier about the implications of serving stale timestamps but that might have gotten lost in the fork and/or I didn't see the answer

289

13:51

<jnewbery> I think if nodes start randomizing timestamps that's no longer true. You could get an old address record with a newer timestamp than the (actually) newer address record

290

13:51

<gleb10101> nehan_: We don't want to spend days digging into outdated nodes and finding a live one...

291

13:51

<nehan_> or think that nodes are stale that aren't actually stale

292

13:51

<gleb10101> And we don't want to spend bandwidth relaying old non-live nodes

293

13:52

<michaelfolkson> I was listening to ariard on TFTC and he was saying it needs a similar level of resource to secure P2P for Neutrino or Lightning as it does secure P2P on Core. Had never thought of it like that before

294

13:52

<nehan_> with this change, where a node might have served a fresh timestamp it would now serve one that was 27 hours old

295

13:52

<gleb10101> pinheadmz: yeah, but I don't know how to address this issue :)

296

13:52

<michaelfolkson> I just assumed Core would take the brunt of addressing a lot of the P2P attacks on Neutrino/Lightning indirectly

297

13:52

<gleb10101> nehan_: true, but in the beginning of the meeting we sort of considered that 1-day old is probably fine.

298

13:53

<gleb10101> We should be talking about at least several-days lag for it to be bad. Although it's a bit arbitrary and depends on many things. It's more of an intuition

299

13:54

<nehan_> gleb10101: doesn't that sort of imply we don't need to update timestamps frequently?

300

13:54

<sipa> jnewbery: i feel that with feelers this is less of an issue, as they will always overwrite the flags data with the actual flags

301

13:55

<sipa> (i need to check if feelers actually override flags)

302

13:55

<amiti> re timestamps and addr relay: I still don't understand how its really helping. as a recipient you are able to assign likelihood-of-node-being-live if the sender is being honest in the reported timestamps. if thats the case, why not just have honest nodes proactively only send addrs of recently-tested-conns?

303

13:55

<gleb10101> nehan_: define frequently :) Records should be at most couple days old. We currently don't need better freshness.

304

13:55

<gleb10101> We don't know how to distinguish 3 days old from 1 day old. The code doesn't.

305

13:56

<willcl_ark> presumeably you could just also make the timstamp up, if you were so inclined

306

13:56

<gleb10101> I mean, we can distinguish, but we don't do anything with it, sorry.

307

13:57

<gleb10101> amiti: Right, maliciously updating timestamps attack is one of my todos :)

308

13:57

<gleb10101> That's also why any fine-grained optimization of being alive is dangerous.

309

13:57

<gleb10101> It's free to bump for an attacker to bump their timestamp

310

13:58

<pinheadmz> is there any banscore type thing if a node sends us 1000 ADDRS and none of them work ?!

311

13:58

<gleb10101> Meaning we don't want to rely on timestamps too much...

312

13:58

<amiti> gleb: huh? like explore the feasibility of attack?

313

13:58

<jnewbery> sipa: I think we do. We call SetServices() when we receive the version, and then disconnect

314

13:58

<sipa> pinheadmz: we won't know they don't work until days, maybe weeks later

315

13:58

<gleb10101> pinheadmz: nope. I mean, we don;t want checking 1000 nodes at once :)

316

13:59

<gleb10101> amiti: we're out of time it seems, hit me up later :)

317

13:59

<pinheadmz> sipa gleb10101 right, and no real pain y trying to connect to bad nodes

318

13:59

<pinheadmz> well great work on this gleb10101 very simple to understand and makes a lot of sense

319

13:59

<troygiorshev> yeah thanks gleb10101!

320

14:00

<gleb10101> Thank you! For those haven't look at the code it's actually few lines so please review :P

321

14:00

<willcl_ark> thanks gleb10101

322

14:00

<emzy> thanks gleb10101!

323

14:00

<andrewtoth> thanks gleb10101!

324

14:00

<lightlike> thanks!

325

14:00

<jnewbery> pinheadmz: and that's a more general problem. How do we 'punish' a node for giving us bad data? Looking at orphan processing and mapBlockSource is left as an exercise for the reader :)

326

14:01

<gleb10101> #endmeeting

327

14:01

<primal> thanks gleb10101

Meeting Log – Asia time zone

Host: jonatack

328

05:04

<jonatack> If anyone is around, we'll get started in just under an hour.

329

06:00

<jonatack> #startmeeting

330

06:00

<jonatack> hi

331

06:01

⚡ jonatack things seem quiet

332

06:01

<sipa> vaguely here

333

06:02

<brikk> hi

334

06:02

<jonatack> hi!

335

06:03

<jonatack> So I spent some time going through the meeting log and the PR.

336

06:03

<jonatack> brikk: did you get a chance to review the PR?

337

06:04

<brikk> jonatack: unfortunately not, I was not aware of this meeting only saw activity now at a convenient time right at the start of my day :)

338

06:04

<jonatack> that's all good

339

06:04

<jonatack> this is about https://bitcoincore.reviews/18991

340

06:05

<jonatack> "Cache responses to GETADDR to prevent topology leaks (p2p)"

341

06:05

<brikk> thanks, I'm looking at it now

342

06:05

<jonatack> to make the ADDR/GETADDR address gossip protocol more private for the node providing addresses

343

06:06

<jonatack> ADDR relay and specifically the GETADDR/ADDR protocol exist for peer discovery in bitcoin's p2p network.

344

06:07

<jonatack> in addition to two other ways: DNS seeders which crawl the network using the p2p protocol, and hardcoded fixed seeds

345

06:07

<jonatack> for instance, one can run dig seed.bitcoin.sipa.be (or dig -t AAAA seed.bitcoin.sipa.be for IPv6) on the command line to see a list of A/AAAA records that seeders might provide

346

06:09

<jonatack> Essentially, the goal of this PR is to slow attackers down, to make them spend more time to learn local or global peer topology.

347

06:11

<jonatack> ADDR relay has various properties of importance. This PR seems to shift the priority among them a bit in the hope of a better tradeoff.

348

06:11

<jonatack> What are these properties?

349

06:11

<jonatack> - Privacy: hiding local and global topology, difficulty of identifying peers, or of linking transactions to IP addresses

350

06:11

<jonatack> - Peer diversity

351

06:12

<jonatack> - Decentralisation: Trust reduction with respect to the DNS seeds and the fixed seeds

352

06:12

<jonatack> - Speed of relay

353

06:12

<jonatack> - Freshness: peers having an up to date list of peers

354

06:13

<jonatack> - Quality: peers who are well-behaved, seen recently, with good uptime

355

06:14

<brikk> What does peer diversity mean?

356

06:14

<jonatack> For example, a tradeoff this PR would seem to be proposing is less freshness (if that is the best word; it may not be) in favor of privacy

357

06:14

<jonatack> or less diversity as well, possibly

358

06:15

<brikk> Speed of relay seems to be a tradeoff as well, right?

359

06:16

<jonatack> i'm not sure

360

06:16

<jonatack> how do you see it?

361

06:17

<brikk> just by looking at the comments in the review

362

06:17

<brikk> there's a lot of comments though and I am yet to make it til the end, so perhaps my perception will change :)

363

06:19

<jonatack> diversity: good question. for example, by ASN, which was a motivation for the -asmap p2p addition in the latest release of bitcoin core.

364

06:21

<jonatack> https://bitcoincore.reviews/16702 covered asmap and contains good resources on the various attacks: erebus, eclipse, bgp hijacking

365

06:22

<jonatack> sipa: do you think today's PR could adversely affect discovery of newly online peers, or ones who change IP address frequently?

366

06:23

<jonatack> ISTM if everything is cached, the records everywhere become a bit older (by 1 day)

367

06:23

<sipa> i'd need to think more about that

368

06:23

<brikk> right, sounds like the things amiti uttarwar was talking about in the reckless vr meetup

369

06:23

<jonatack> and new node discovery might be slower... but i would need to look at it more.

370

06:26

<brikk> jonatack: when you say new node discovery, does that mean that I bring a new node to the discovery and it would mean issues for me, or that someone else brings a new node online and the rest of the network has trouble discovering it?

371

06:27

<jonatack> Both? This is an aspect I'm not sure on.

372

06:27

<brikk> ok

373

06:30

<jonatack> The hard thing with p2p changes like this, to my mind, is how to simulate the effects before actually deploying on the network.

374

06:31

<luke-jr> it's in Knots 0.20.0 fwiw

375

06:31

<brikk> I agree

376

06:31

<luke-jr> not quite the same thing, but it's _part_ of the network that might be observable

377

06:32

<jonatack> luke-jr: nice. released june 14? any stats on number of nodes running that version?

378

06:32

<luke-jr> [04:23:42] <jonatack> ISTM if everything is cached, the records everywhere become a bit older (by 1 day) <-- isn't it 1 day *per hop*?

379

06:33

<luke-jr> jonatack: the release was June 16th, but based on June 14th PRs

380

06:33

<sipa> luke-jr: there are two mechanisms though; getaddr->addr, and normal addr gossipping

381

06:33

<luke-jr> I'm seeing only 24 nodes upgraded so far

382

06:33

<sipa> i don't think the second is affected by this PR but i haven't reviewed in detail

383

06:33

<luke-jr> sipa: ah

384

06:34

<sipa> and for the getaddr->addr mechanism there isn't really any concept of hops

385

06:34

<luke-jr> do we currently *use* getaddr as a client then?

386

06:35

<sipa> luke-jr: yes, under certain conditions, in response to version

387

06:35

<sipa> so at most once per connection

388

06:35

<sipa> (we also only respond once per connection iirc)

389

06:36

<luke-jr> looks like normally once at connection

390

06:36

<luke-jr> due to pfrom.nVersion >= CADDR_TIME_VERSION

391

06:36

<luke-jr> outbound connection*

392

06:37

<jonatack> CADDR always puts a smile on my face

393

06:38

<luke-jr> ?

394

06:38

<jonatack> seeing lisp in the codebase :)

395

06:38

<sipa> )))))))))))

396

06:42

⚡ luke-jr watches a tumbleweed roll by

397

06:42

<jonatack> luke-jr: "like normally once at connection" you're referring to ProcessMessage or RelayAddress?

398

06:42

<luke-jr> just glancing at the conditions around connman->PushMessage(&pfrom, CNetMsgMaker(nSendVersion).Make(NetMsgType::GETADDR));

399

06:43

<jonatack> ah VERSION

400

06:43

<luke-jr> it seemed like it happens fairly normal circumstances

401

06:46

<jonatack> luke-jr: do you see any adverse effects from this PR?

402

06:46

<luke-jr> so far no

403

06:48

<jonatack> how did you decide to add it to knots?

404

06:48

<luke-jr> less than 50% of the network is running 0.19.x+, so it's not the end of the world if we discover something post-release either

405

06:48

<luke-jr> jonatack: Knots merge policy is very relaxed - if it won't clearly disrupt anything, it usually goes in

406

06:49

⚡ jonatack looks at https://dsn.tm.kit.edu/bitcoin to see

407

06:49

<jonatack> luke-jr: ok

408

06:50

<jonatack> https://dsn.tm.kit.edu/bitcoin/#useragents

409

06:50

<jonatack> luke-jr: you have your own proprietary stats iirc?

410

06:51

<luke-jr> yes http://luke.dashjr.org/programs/bitcoin/files/charts/branches.html

411

06:51

<jonatack> thakns

412

06:52

<luke-jr> I suppose http://luke.dashjr.org/programs/bitcoin/files/charts/branches.html?onlylistening=1 shows 55% running recent versions

413

06:53

<brikk> I'm trying to wrap my head around the peer diversity: can I think of this as three nodes: nodes with an ipv4/ipv6 address that never change, nodes behind tor, nodes whose address change every 24 hours

414

06:53

<jonatack> seems quite different from the dsn one in germany which appears to show more 0.19.x nodes

415

06:53

<luke-jr> I guess a lot of 0.16.x nodes are firewalled or something

416

06:53

<luke-jr> jonatack: I include non-listening by default

417

06:53

<jonatack> got it