Cache responses to GETADDR to prevent topology leaks (p2p )
Jul 1, 2020
https://github.com/bitcoin/bitcoin/pull/18991
Host:
naumenkogs
-
PR author: naumenkogs
The PR branch HEAD was 52d22a3c at the time of this review club meeting.
Notes
The motivation for PR 18991
is to make the ADDR
/GETADDR
address gossip protocol more private for the
node providing addresses.
Nodes transmit three main objects over the p2p network: transactions, blocks
and network addresses (e.g. 172.1.2.3). Today’s PR is concerned with network
address gossiping.
There are 2 relevant p2p messages:
GETADDR
(the
sender requests the receiver for network addresses
along with some per-address metadata) and
ADDR
(the receiver
responds with the requested data they want to share
with the sender).
This GETADDR
/ADDR
protocol is currently used every time a node makes a new
connection, after receiving the version message.
There are other ways of learning about potential peers, which are not
relevant to this PR, but it’s worth being aware of them:
Network addresses of the nodes are propagated in the
network in an unsolicited way: from time to time, every node makes a
self-announcement which is propagated across the network.
Nodes can query predefined DNS seeds for a list of potential peers.
Every time a node hears about another node in the network, it adds/updates
a record in its AddrMan (Address Manager). This is where all the data
about nodes in the network is stored, and this is where a GETADDR
receiver
looks to construct an ADDR
response to the requester.
It turns out that it’s possible for a spy node to easily scrape the full
contents of any reachable node’s AddrMan. The spy just has to connect to a
victim node multiple times and execute GETADDR
. This scraped data can then be
used to infer private information about the victim.
For example, a spy can monitor the victim’s AddrMan content in real time and
figure out which peers a node is connected to. A spy can also compare the
AddrMan content from two different connections (e.g. one identified by Tor
address and one identified by IPv4) and figure out that it’s actually the same
physical node.
This PR is a first step towards fixing these privacy issues. If we limit
(cache) the leaked portion of AddrMan, these inference activities will become
much harder. Caching in this context means that the ADDR
response (which is
only a small subset of a node’s AddrMan content) remains the same for every
GETADDR
call during (roughly) a day.
Questions
Did you review the PR? Concept ACK, approach ACK, tested ACK, or
NACK ?
You’re always encouraged to put your PR review on GitHub, even after it has
been merged.
What is the importance of ADDR
relay and specifically the GETADDR
/ADDR
protocol? What are the goals of this protocol? Which properties of ADDR
relay are important?
Do you understand the examples of the attacks described above? Can you think
of any other ways to exploit scraping AddrMan? If you find something really
dangerous, consider discussing it first with another developer privately.
Do you think the attacks are severe? What are the potential consequences of
making the topology (local or global) public?
Do you think the suggested PR sufficiently addresses the problem?
What are the side effects of the suggested solution?
Meeting Log
1 13:00 <jnewbery> #startmeeting
10 13:00 <jnewbery> Hi folks! Today we're going to be looking at a part of the p2p network that we haven't looked at before: address discovery and gossip.
11 13:01 <jnewbery> Or in other words: how a node learns about potential new peers to connect to.
12 13:01 <michaelfolkson> hi
15 13:01 <jnewbery> We're lucky to have the PR author hosting today. So without any more delay, I'll hand over to gleb1010.
17 13:01 <gleb1010> Hi again all! Hope we have a productive hour
18 13:01 <gleb1010> I saw some useful review comments from participants, but let’s check with everyone.
19 13:01 <gleb1010> Have you reviewed the PR? Y/n
27 13:02 <gleb1010> That's impressive :)
28 13:02 <gleb1010> You are feel free to ask any questions at any time, but let’s start with the notes we posted on the website.
29 13:02 <gleb1010> Let's first discuss this
30 13:02 <gleb1010> What is the importance of ADDR relay and specifically the GETADDR/ADDR protocol?
31 13:03 <pinheadmz> gleb1010 its for network nodes to discover new peers
32 13:03 <emzy> Asking a peer for more other peers.
34 13:03 <gleb1010> Right, so since Bitcoin operates over a peer-to-peer network, it's essential for everyone to know somebody else from the network
35 13:03 <pinheadmz> there are also DNS seeds, and I think a hard-coded list of peers (?) for bootstrapping
37 13:04 <gleb1010> pinheadmz: right, there are several ways to learn about other nodes in the network
38 13:04 <emzy> even a DNS seed uses ADDR to find nodes.
39 13:05 <pinheadmz> emzy true but to get a list of those peers from the seeder its actually DNS protocol
40 13:05 <willcl_ark> hmmmm, just thinking out loud - does this PR affect DNS nodes' responses at all? (I've not looked at the dnsseeder code)
41 13:05 <gleb1010> DNS internals always felt a bit obfuscated to me, so if you're somewhat confused, you're not alone :)
42 13:06 <gleb1010> Nope, I don't think they affect those responses, but maybe sipa can confirm?
43 13:06 <michaelfolkson> How were those hard-coded peers chosen? Are they regularly checked for uptime etc?
45 13:06 <gleb1010> Pieter runs one out of 6 DNS seeds we have, so that the very new nodes know how to learn about each other
46 13:06 <gleb1010> michaelfolkson: I believe Wladimir maintains the list.
47 13:06 <sipa> do does emzy
48 13:07 <gleb1010> Well, hard-coded peers are updated within regular release process.
49 13:07 <emzy> willcl_ark: it connects direcly via p2p to a node to ask. No local bitcoind is running.
51 13:07 <gleb1010> So I think if you look hard enough you can find a PR with that list
52 13:07 <willcl_ark> emzy: thanks,
53 13:07 <gleb1010> Okay this one is really interesting.
54 13:07 <gleb1010> Which properties of ADDR relay are important?
55 13:08 <gleb1010> We know that blocks should be relayed faster, we know that transactions should be more or less unlinkable to IP
56 13:08 <michaelfolkson> sipa: Thanks
57 13:08 <willcl_ark> Ideally, you want to get a diverse list of peers
58 13:08 <pinheadmz> and you want peers with good uptime or at least "seen" recently
59 13:08 <jnewbery> emzy: DNS seeds actually use DNS
60 13:08 <gleb1010> Right, one way to look at it is to consider an individual node's AddrMan, not cross-network relay.
62 13:09 <gleb1010> Every node wants to have a good up-to-date list of nodes in their AddrMan.
64 13:10 <emzy> jnewbery: it answers via DSN but it has to crawl the nodes via p2p
65 13:10 <gleb1010> This is where we ideally should talk about unsolicited self-announcements which propagate ADDRs across the network, but let's leave that for homework
66 13:10 <gleb1010> Let's see what's this parallel discussion is about :)
67 13:10 <lightlike> so this change might influence DNS seed indirectly. If the DNS seeder regularly scrapes to get new addresses (and deliver them via DNS) and it gets cached responses in the future, it might introduce some kind of inertia to changes in the network.
68 13:10 <emzy> s/DSN/DNS/
69 13:11 <gleb1010> emzy is curious how DNS servers actually learn about the nodes they gonna feed to nodes querying them
70 13:11 <gleb1010> or maybe not curious but he makes a statement :)
71 13:11 <sipa> emzy runs a DNS seed
72 13:11 <willcl_ark> lightlike: but the cached responses will be different from each node
73 13:11 <jnewbery> emzy: I'm not sure what you mean by crawl the nodes, but it gets its list of potential peers by DNS
74 13:12 <lightlike> willcl_ark: true, so probably not noticeably
75 13:12 <sipa> jnewbery: the DNS seeder itself, not the client
76 13:12 <sipa> jnewbery: it has to get the list of good IP addresses it serves from somewhere
77 13:12 <gleb1010> This is true. If everything is cached, the records everywhere become a bit older (within 1 day)
78 13:12 <sipa> to do so, it crawls the network using the P2P protocol
79 13:12 <pinheadmz> anyone can do a quick experiement to query the DNS seed nodes: `dig seed.bitcoin.sipa.be`
80 13:12 <pinheadmz> returns a list of A records
81 13:13 <sipa> you can get IPv6 ones using dig -t AAAA seed.bitcoin.sipa.be
82 13:13 <troygiorshev> pinheadmz: thx
83 13:13 <emzy> tnx sipa, that was my point.
84 13:13 <gleb1010> Okay, so we can talk about the impact a bit more now
85 13:13 <willcl_ark> so I think this doesn't affect the DNS seeds
86 13:13 <gleb1010> How we should look at the threat of everyone gets a bit older AddrMan records?
87 13:14 <gleb1010> Like, let's say a node gets 1000 records, but 500 of them were updated since the last cache
88 13:14 <michaelfolkson> That is a threat? Or just an inefficiency?
89 13:14 <troygiorshev> will this make it more difficult for a new node to get a diverse set of peers?
90 13:14 <willcl_ark> less than 24h should not be too much of a problem (unless node churn is very bad)
91 13:14 <gleb1010> Right, so I remember a research paper from Till Neudecker showing that churn in our network is really low
92 13:15 <jnewbery> ah, I understand. Thanks
93 13:15 <emzy> gleb1010: It will take longer for new nodes to be discoverd by others.
94 13:15 <jnewbery> yes. I thought emzy was talking about 'DNS seed connections' rather than 'DNS seeders'. All clear now!
95 13:15 <lightlike> troygiorshev: it will probably introduce a delay until you get new inbound peers.
96 13:15 <sipa> fwiw, my dns seeder software will generally visit every node it knows about in the network every few hours
98 13:15 <sipa> so this will impact it, but not much i think, given the low churn
99 13:15 <gleb1010> That's a great link above, it has so many things, save and take a look later :)
100 13:16 <gleb1010> sipa: So the point is, some of the records might have an older timestamps, but they will probably be still alive?
101 13:16 <gleb1010> And they won't be garbage in our AddrMan
102 13:17 <gleb1010> Okay, let's move forward.
103 13:17 <pinheadmz> gleb1010 are the timestamps in addrman updated with every message from the peer? or just once on connection?
104 13:18 <gleb1010> pinheadmz: it's a bit complicated there, afaik it's different for inbound/outbounds
105 13:18 <gleb1010> For one of them it's every message, for another it's when they connect. We also update based on unsolicited ADDRs...
106 13:19 <gleb1010> But yeah, the expectations of these timestamps are a bit vague right now, probably that's an area of future improvements.
107 13:19 <emzy> If an attacker has like 1000 ipv4 addresses this PR would not mitigate the attack?
108 13:20 <gleb1010> There is no *the attack* :P
110 13:20 <gleb1010> Yeah coinscope is great!
111 13:20 <troygiorshev> pinheadmz: fwiw if you feel like looking into the history, it was apparently changed in version 0.10.1
112 13:20 <gleb1010> Let's get to attacks.
113 13:20 <willcl_ark> What I am interested to know is, how does the previous (current) ADDR response give away your connected vs disconnected peers? Unless disclosing is a security risk OFC
114 13:20 <amiti> idk how much the code has changed since then, but gave me an idea of the complexity
115 13:20 <gleb1010> In the PR, I was a bit vague about the attacks, because it's always a bit sensitive.
116 13:21 <emzy> gleb1010: or privacy leak
117 13:21 <willcl_ark> Also an acceptable answer :)
118 13:21 <gleb1010> But the coinscope paper is one example of exploiting it to infer direct links between nodes.
119 13:21 <gleb1010> Another leak is to map Tor identity to ipv4 identity of the same node. If we scrape everything and compare all timestamps, it's very easy to tell it's the same entity.
120 13:21 <pinheadmz> we've talked about eclipse attacks in review club before as well - it could be used by an atacker to check on their success maybe
121 13:22 <gleb1010> Anybody got some time to think of any other vectors?
122 13:22 <michaelfolkson> You mentioned linking Tor and IP addresses
123 13:22 <amiti> I was wondering how a spy could go from knowing addrman to figuring out the network topology, the coinscope paper uses the timestamps. is that the main way or are there others?
124 13:22 <gleb1010> pinheadmz: right, it's great to remember that most of researchers seem to feel that topology should be private/obscured for security reasons :)
125 13:22 <michaelfolkson> And then there's linking Bitcoin nodes to Lightning nodes ;)
126 13:23 <gleb1010> amiti: Great question! Timestamps is what I was always thinking too.
127 13:23 <willcl_ark> So from that paper it seems "recent and unique" timestamps (might) give it away
128 13:23 <gleb1010> willcl_ark can you elaborate?
129 13:23 <sipa> i would expect there are ways to introduce randomly generated IPs in the network, broadcast them on certain incoming connections, and then observing which other peers know about them
130 13:24 <willcl_ark> I'm not sure, but perhaps if we hear of a peer from another peer, we have the same timestamp, but if we connect ourselves, we get a unique timestamp
131 13:24 <willcl_ark> so if you see the same timestamp from a few peers we can't tell, but you give me a unique and recent timestamp, we can infer you are connected directly to them
132 13:24 <troygiorshev> sipa: ah like putting rubber ducks in a river and seeing where they end up
133 13:24 <gleb1010> My first idea was to just track the "self-announcements"
134 13:25 <gleb1010> Nodes sometimes announce themselves to their direct peers
135 13:25 <sipa> michaelfolkson: lightning node have an explicit public identity, so i'm not sure what there is to link
136 13:25 <gleb1010> And those timestamps will be very distinct
137 13:25 <gleb1010> sipa: link lightning node to a bitcoin ip or onion leads to issues, see our time-dilation attacks paper :)
138 13:25 <michaelfolkson> But you don't always know the Bitcoin full node that the Lightning node is using
140 13:26 <gleb1010> Alright, there are couple ideas to exploit this above
141 13:26 <gleb1010> Perhaps someone comes up with something even cooler than we can think of
142 13:26 <gleb1010> And maybe not much fixed by caching :)
143 13:26 <pinheadmz> gleb1010 this cache applies to all your peers?
144 13:26 <pinheadmz> so within 24 hours, every node that GETADDR from the same network get the same response?
145 13:26 <sipa> oh no, a netsplit
146 13:27 <pinheadmz> otherwise you could sybil a node and just get all the addrs anyway
147 13:27 <gleb1010> pinheadmz: yes, same cache per the network of request originator
148 13:27 <willcl_ark> i think the cache would be different for each peer, like the mempool
149 13:27 <gleb1010> Except white-listed requestor
150 13:27 <gleb1010> sipa: huh?
151 13:27 <sipa> gleb1010: on IRC, now, in this channel
152 13:28 <sipa> we lost jnewbery and others
153 13:28 <gleb1010> ah i see :)
154 13:28 <gleb1010> this happens from time to time
155 13:28 <gleb1010> My handbook doesn't have instructions for this...
156 13:28 <michaelfolkson> A network partition
158 13:28 <gleb1010> I think that subnet is in good hands of jnewbery
159 13:28 <willcl_ark> hard forked them off
160 13:28 <gleb1010> okay so the discussion above was
161 13:29 <gleb1010> I suggest using a separate cache for different network origin, and someone above pointed out why it is useful
162 13:29 <gleb1010> Welcome back!
163 13:29 <gleb1010> Alright, let's try to merge back to the notes
164 13:30 <michaelfolkson> In the time you were away we eclipsed all your peers
165 13:30 <gleb1010> We sort of discussed the severity of the attacks
166 13:30 <gleb1010> Leaking the topology can eclipsing a node easier, also spying easier
167 13:30 <gleb1010> Also stealing funds from Lightning... many things
168 13:30 <gleb1010> Anything else?
169 13:31 <michaelfolkson> Privacy attacks
170 13:31 <willcl_ark> elcipse of lightning light clients (e.g. Neutrino) seems like a particularly severe one
171 13:31 <gleb1010> Well, part of the issue is that Neutrino's p2p stack is poor, so I'm not sure it's a good motivation for a Bitcoin Core change haha
172 13:31 <willcl_ark> but i guess it all comes back to the same base-layer eclipse
173 13:31 <willcl_ark> sure!
174 13:32 <gleb1010> Anyone looking for some little work should consider helping to mature Neutrino implementation
175 13:32 <gleb1010> With the experience we're getting while working on this stuff :P
176 13:33 <gleb1010> Yeah, so basically eclipse attacks are the worst probably (and netsplits, which is sort of an eclipse too but large scale)...
177 13:33 <gleb1010> You can do what you want with your victim, so we don't want that.
178 13:33 <gleb1010> Bringing us to the question... does this PR really help?
179 13:33 <pinheadmz> i guess if you got enough addrs from enough nodes you could uncover who the critical nodes are
180 13:33 <gleb1010> (to hide local and global topologies)
181 13:33 <pinheadmz> like, nodes with the most connections, potential attack points
182 13:34 <gleb1010> pinheadmz: Right, it would be easier to split the net if one kills the bridges, but hopefully our graph is random enough so that's not very helpful :)
183 13:34 <amiti> does netsplit = network partition?
184 13:34 <primal> not sure if it came across but I had the following comment during the netsplit
185 13:34 <primal> is the PR not shifting the attack strategy to gain topology information from an attacker being rate limited to the attacker spinning up a botnet?
186 13:34 <primal> I haven't worked through how peering dynamics and rate limiting ADDR will evolve, but ^^ is a gut-reaction check
187 13:34 <gleb1010> amiti: yeah
188 13:34 <willcl_ark> it appears that it might slow down the attack to the extent that peers get new connections etc. so that wouldn't be so effective
189 13:34 <nehan_> gleb1010: it slows the attacker down
190 13:34 <gleb1010> Right. This PR just makes an attacker spend way more time on learning what they want to learn.
191 13:34 <willcl_ark> at pretty minimal cost to "freshness" of ADDR responses
192 13:35 <troygiorshev> I'm worried that it simply delays learning the topology, but i'm not sure
193 13:35 <gleb1010> I never bothered to measure the exact attack delay we may introduce
194 13:35 <lightlike> primal: you can connect with as many bots that you want, all will get the same cached response for a while.
195 13:35 <gleb1010> Because I have other ideas on the way to improve the privacy :)
196 13:35 <gleb1010> Let's think of how we can improve this stuff even further?
197 13:35 <pinheadmz> is there a drawback where a new node with maxinbound 1000 wont actually get any new inbounds for up to 24 hours?
198 13:36 <primal> lightlike ahh ok so the cache reduces the set of info that we allow to leak out of our node
199 13:36 <gleb1010> Anything crazy creative works, let's discuss ideas
200 13:36 <michaelfolkson> Sometimes delays are all you can do troygiorshev. Make it unviable to do unless extremely targeted victim
201 13:36 <gleb1010> primal! Right. And also the indicators=timestamps are a little "outdated"
202 13:36 <willcl_ark> primal: I guess it's like "leak rate"
203 13:36 <gleb1010> pinheadmz: not sure I follow.
204 13:36 <primal> "topology disclosure rate" or something of the sort
205 13:37 <gleb1010> What this has to do with inbound limit?
206 13:37 <jnewbery> pinheadmz: it'll reduce how quickly other nodes learn about your address in the first 24 hours, but won't eliminate it entirely
207 13:37 <emzy> gleb1010: what about randomness in the last seen time?
208 13:37 <pinheadmz> jnewbery right, just a delay. and for inbounds its like "eh, your loss"
209 13:37 <jnewbery> because each node's cache is updated at a different time
210 13:37 <willcl_ark> pinheadmz: you should get 1000 different cached responses from all of those peers, so you should be fine
211 13:37 <primal> emzy: you don't want randomness, you want to chop the information at a certain bit
212 13:37 <gleb1010> emzy: this is interesting!
213 13:37 <jnewbery> also, be aware that there is another method of address gossipping, which is that each node will announce its own address to its peers every ~24 hours, and those peers will gossip that on to some of their peers. That method is unaffected by this PR.
214 13:37 <nehan_> the thing i'm trying to understand are the implications of serving stale timestamps. the timestamp logic is weird (as described in the coinscope paper, which might be out of date)
215 13:37 <thomasb06> is there a reverse mechanism in case of misusage: certain nodes able to overcome the privacy for example?
216 13:37 <gleb1010> wow this moves fast!
217 13:37 <emzy> primal: also vaid
218 13:37 <pinheadmz> willcl_ark i referring more to my own IP getting out to the network, if i have a lot of open slots to offer
219 13:37 <jnewbery> that there is another method of address gossipping, which is that each node will announce its own address to its peers every ~24 hours, and those peers will gossip that on to some of their peers. That method is unaffected by this PR.
221 13:38 <jnewbery> (or more accurately: for each peer, once every 24 hours, we reannounce our address to that peer)
222 13:38 <gleb1010> jnewbery: I believe it's a bit less often than that, because there's a bloom filter in that announcement, but I have to double-check...
223 13:38 <willcl_ark> oh no! we lost him
224 13:38 âš¡ sipa spawns a new gleb1010
226 13:39 <pinheadmz> gleb1011 enters :-)
227 13:39 <michaelfolkson> I do wonder how easy/difficult it is to knock a node off the Bitcoin network
228 13:39 <gleb10101> Now I got forked, sorry.
229 13:39 <michaelfolkson> Because that's what eclipse attacks rely on right?
230 13:39 <gleb10101> michaelfolkson: good question! There are many ways to knock a node
231 13:39 <emzy> is there a problem for nodes that change evey 24h there ip address to be discoverd? Beause this is often the case by DSL/dial in connections.
232 13:40 <gleb10101> michaelfolkson: hopefully all those many ways are hard/expensive enough :)
233 13:40 <pinheadmz> jnewbery dont wanna veer too far off topic, but doesnt a node learn its own IP address from other peers?
234 13:40 <gleb10101> emzy: I actually didn't think about them. What would be the issue? They get less connections inbound?
235 13:41 <jnewbery> pinheadmz: yes, I believe that's a part of it, although it's not something I've looked at too closely
236 13:41 <pinheadmz> emzy this is whayt i was getting to. but actually most home-run nodes probably arent accepting inbounds anyway (firewall, etc)
237 13:41 <emzy> gleb10101: exactly. They will be second class nodes.
238 13:41 <primal> pinheadmz I don't understand the significance of a node learning it's own ip add from other peers. what am I missing?
239 13:41 <amiti> pinheadmz, jnewbery: oh I've totally seen code in connection logic that says "if its yourself, disconnect" 😛
240 13:42 <pinheadmz> primal bitcoind used to actually make a request to whats-my-ip.com or something
241 13:42 <gleb10101> emzy: fair enough, maybe you should think about that more and then come to the PR with the conclusions
242 13:42 <pinheadmz> i guess its hard for a process on your laptop to know what its internet IP truly is
243 13:42 <jnewbery> amiti: that's something slightly different, and is detected by including a random nonce in the VERSION message
244 13:42 <emzy> pinheadmz: I think this is the case.
245 13:42 <gleb10101> emzy: wondering if that's the scenario we're targeting
246 13:42 <gleb10101> Alright, I asked about the alternative solutions, I was about to share one :)
247 13:42 <emzy> gleb10101: good question.
248 13:42 <gleb10101> Just to throw this another idea, we don't have to discuss it.
249 13:43 <gleb10101> I wanna implement self-announcement on feeler connection with some probability.
250 13:43 <amiti> jnewbery: oh, interesting. ok thanks
251 13:43 <primal> pinheadmz are you saying that bc a node can learn its ip addr from peers that removes the need to communicate with other services?
252 13:43 <gleb10101> We do connect to some node in the network every 2 minutes just to see if they're alive, might as well ask them to relay our addr. This would obfuscate it even further
253 13:43 <pinheadmz> primal right or more sepciifcally centralized services that arent even bitcoin related...ill try to find the PR its an old one
254 13:43 <gleb10101> Maybe someone gets any ideas like this for future PR :)
255 13:44 <sipa> bitcoin core used to query some "find my ip" website in a long-gone past
256 13:44 <jnewbery> gleb10101: what do you mean by 'obfuscate' in this context?
257 13:44 <willcl_ark> sipa: sounds like a privacy nightmare :P
258 13:44 <sipa> willcl_ark: yes indeed
259 13:44 <gleb10101> jnewbery: Coinsope and my own idea initially was to 1. scrape AddrMan often 2. infer inbounds by new records/special timestamps
260 13:45 <pinheadmz> sipa right, and then that website turned into a real estate site or something, was bascially getting ddosed by bitcoin network :-P
261 13:45 <gleb10101> jnewbery: so now these new records/special timestamps will be not only at victim's direct peers and their peers, but also at random feelers
262 13:45 <gleb10101> Yeah, this is an interesting story how we moved from that website...
263 13:45 <willcl_ark> could nodes use a random offset for the timestamp when serving (or storing) ADDR
264 13:46 <willcl_ark> then nobody would ever have the same
265 13:46 <gleb10101> willcl_ark: this is what greg maxwell told me we already do when I discovered this issue a year ago haha
266 13:46 <gleb10101> But then I don't think we do randomize them
267 13:46 <gleb10101> So that was some phantom feature
268 13:46 <willcl_ark> :) But we should!
269 13:47 <gleb10101> So the idea is to randomize a timestamp on every ADDR sending
270 13:47 <gleb10101> This will help with some issues...
272 13:47 <gleb10101> willcl_ark tracking occurence of new records in AddrMan still would be possible
273 13:47 <emzy> the randomness may be the less invasive for the P2P network.
274 13:48 <amiti> fundamental question: what is the intended purpose of the ADDR timestamps? I saw logic that used this info to not relay old addrs. is that the main reason?
275 13:48 <gleb10101> amiti: yeah I believe so.
276 13:48 <primal> pinheadmz 845c86d128fb97d55d125e63653def38729bd2ed
277 13:49 <willcl_ark> gleb10101: hmmmm, interesting
278 13:49 <gleb10101> I believe every time we get an ADDR, we would deprioritize it if it's 1 week old
279 13:49 <primal> ah yeah you linked it
280 13:49 <gleb10101> Okay, we have 10 minutes left
281 13:49 <gleb10101> I was about to ask about side-effects, but we actually discussed them :)
282 13:49 <gleb10101> But someone can highlight a side-effect of their concern again
283 13:50 <gleb10101> Or just ask any other question?
284 13:50 <willcl_ark> I was wondering about setting ADDR as a default for whitebind
285 13:50 <jnewbery> There was a suggestion in https://github.com/bitcoin/bitcoin/pull/16442 to dynamically change the local service bits depending on whether the compact block filter index was built. It was argued that because it would only ever go from false to true, that would be ok.
286 13:50 <pinheadmz> is there any way an ADDR message cahced response could be used to identify a node? if you get the same response from nodes running on two IPs for example?
287 13:50 <willcl_ark> it doesn't really make much difference as you specify in config, but...
288 13:50 <nehan_> I asked earlier about the implications of serving stale timestamps but that might have gotten lost in the fork and/or I didn't see the answer
289 13:51 <jnewbery> I think if nodes start randomizing timestamps that's no longer true. You could get an old address record with a newer timestamp than the (actually) newer address record
290 13:51 <gleb10101> nehan_: We don't want to spend days digging into outdated nodes and finding a live one...
291 13:51 <nehan_> or think that nodes are stale that aren't actually stale
292 13:51 <gleb10101> And we don't want to spend bandwidth relaying old non-live nodes
293 13:52 <michaelfolkson> I was listening to ariard on TFTC and he was saying it needs a similar level of resource to secure P2P for Neutrino or Lightning as it does secure P2P on Core. Had never thought of it like that before
294 13:52 <nehan_> with this change, where a node might have served a fresh timestamp it would now serve one that was 27 hours old
295 13:52 <gleb10101> pinheadmz: yeah, but I don't know how to address this issue :)
296 13:52 <michaelfolkson> I just assumed Core would take the brunt of addressing a lot of the P2P attacks on Neutrino/Lightning indirectly
297 13:52 <gleb10101> nehan_: true, but in the beginning of the meeting we sort of considered that 1-day old is probably fine.
298 13:53 <gleb10101> We should be talking about at least several-days lag for it to be bad. Although it's a bit arbitrary and depends on many things. It's more of an intuition
299 13:54 <nehan_> gleb10101: doesn't that sort of imply we don't need to update timestamps frequently?
300 13:54 <sipa> jnewbery: i feel that with feelers this is less of an issue, as they will always overwrite the flags data with the actual flags
301 13:55 <sipa> (i need to check if feelers actually override flags)
302 13:55 <amiti> re timestamps and addr relay: I still don't understand how its really helping. as a recipient you are able to assign likelihood-of-node-being-live if the sender is being honest in the reported timestamps. if thats the case, why not just have honest nodes proactively only send addrs of recently-tested-conns?
303 13:55 <gleb10101> nehan_: define frequently :) Records should be at most couple days old. We currently don't need better freshness.
304 13:55 <gleb10101> We don't know how to distinguish 3 days old from 1 day old. The code doesn't.
305 13:56 <willcl_ark> presumeably you could just also make the timstamp up, if you were so inclined
306 13:56 <gleb10101> I mean, we can distinguish, but we don't do anything with it, sorry.
307 13:57 <gleb10101> amiti: Right, maliciously updating timestamps attack is one of my todos :)
308 13:57 <gleb10101> That's also why any fine-grained optimization of being alive is dangerous.
309 13:57 <gleb10101> It's free to bump for an attacker to bump their timestamp
310 13:58 <pinheadmz> is there any banscore type thing if a node sends us 1000 ADDRS and none of them work ?!
311 13:58 <gleb10101> Meaning we don't want to rely on timestamps too much...
312 13:58 <amiti> gleb: huh? like explore the feasibility of attack?
313 13:58 <jnewbery> sipa: I think we do. We call SetServices() when we receive the version, and then disconnect
314 13:58 <sipa> pinheadmz: we won't know they don't work until days, maybe weeks later
315 13:58 <gleb10101> pinheadmz: nope. I mean, we don;t want checking 1000 nodes at once :)
316 13:59 <gleb10101> amiti: we're out of time it seems, hit me up later :)
317 13:59 <pinheadmz> sipa gleb10101 right, and no real pain y trying to connect to bad nodes
318 13:59 <pinheadmz> well great work on this gleb10101 very simple to understand and makes a lot of sense
319 13:59 <troygiorshev> yeah thanks gleb10101!
320 14:00 <gleb10101> Thank you! For those haven't look at the code it's actually few lines so please review :P
321 14:00 <willcl_ark> thanks gleb10101
322 14:00 <emzy> thanks gleb10101!
323 14:00 <andrewtoth> thanks gleb10101!
324 14:00 <lightlike> thanks!
325 14:00 <jnewbery> pinheadmz: and that's a more general problem. How do we 'punish' a node for giving us bad data? Looking at orphan processing and mapBlockSource is left as an exercise for the reader :)
326 14:01 <gleb10101> #endmeeting
327 14:01 <primal> thanks gleb10101
Meeting Log – Asia time zone
Host: jonatack
328 05:04 <jonatack> If anyone is around, we'll get started in just under an hour.
329 06:00 <jonatack> #startmeeting
331 06:01 âš¡ jonatack things seem quiet
332 06:01 <sipa> vaguely here
335 06:03 <jonatack> So I spent some time going through the meeting log and the PR.
336 06:03 <jonatack> brikk: did you get a chance to review the PR?
337 06:04 <brikk> jonatack: unfortunately not, I was not aware of this meeting only saw activity now at a convenient time right at the start of my day :)
338 06:04 <jonatack> that's all good
340 06:05 <jonatack> "Cache responses to GETADDR to prevent topology leaks (p2p)"
341 06:05 <brikk> thanks, I'm looking at it now
342 06:05 <jonatack> to make the ADDR/GETADDR address gossip protocol more private for the node providing addresses
343 06:06 <jonatack> ADDR relay and specifically the GETADDR/ADDR protocol exist for peer discovery in bitcoin's p2p network.
344 06:07 <jonatack> in addition to two other ways: DNS seeders which crawl the network using the p2p protocol, and hardcoded fixed seeds
345 06:07 <jonatack> for instance, one can run dig seed.bitcoin.sipa.be (or dig -t AAAA seed.bitcoin.sipa.be for IPv6) on the command line to see a list of A/AAAA records that seeders might provide
346 06:09 <jonatack> Essentially, the goal of this PR is to slow attackers down, to make them spend more time to learn local or global peer topology.
347 06:11 <jonatack> ADDR relay has various properties of importance. This PR seems to shift the priority among them a bit in the hope of a better tradeoff.
348 06:11 <jonatack> What are these properties?
349 06:11 <jonatack> - Privacy: hiding local and global topology, difficulty of identifying peers, or of linking transactions to IP addresses
350 06:11 <jonatack> - Peer diversity
351 06:12 <jonatack> - Decentralisation: Trust reduction with respect to the DNS seeds and the fixed seeds
352 06:12 <jonatack> - Speed of relay
353 06:12 <jonatack> - Freshness: peers having an up to date list of peers
354 06:13 <jonatack> - Quality: peers who are well-behaved, seen recently, with good uptime
355 06:14 <brikk> What does peer diversity mean?
356 06:14 <jonatack> For example, a tradeoff this PR would seem to be proposing is less freshness (if that is the best word; it may not be) in favor of privacy
357 06:14 <jonatack> or less diversity as well, possibly
358 06:15 <brikk> Speed of relay seems to be a tradeoff as well, right?
359 06:16 <jonatack> i'm not sure
360 06:16 <jonatack> how do you see it?
361 06:17 <brikk> just by looking at the comments in the review
362 06:17 <brikk> there's a lot of comments though and I am yet to make it til the end, so perhaps my perception will change :)
363 06:19 <jonatack> diversity: good question. for example, by ASN, which was a motivation for the -asmap p2p addition in the latest release of bitcoin core.
365 06:22 <jonatack> sipa: do you think today's PR could adversely affect discovery of newly online peers, or ones who change IP address frequently?
366 06:23 <jonatack> ISTM if everything is cached, the records everywhere become a bit older (by 1 day)
367 06:23 <sipa> i'd need to think more about that
368 06:23 <brikk> right, sounds like the things amiti uttarwar was talking about in the reckless vr meetup
369 06:23 <jonatack> and new node discovery might be slower... but i would need to look at it more.
370 06:26 <brikk> jonatack: when you say new node discovery, does that mean that I bring a new node to the discovery and it would mean issues for me, or that someone else brings a new node online and the rest of the network has trouble discovering it?
371 06:27 <jonatack> Both? This is an aspect I'm not sure on.
373 06:30 <jonatack> The hard thing with p2p changes like this, to my mind, is how to simulate the effects before actually deploying on the network.
374 06:31 <luke-jr> it's in Knots 0.20.0 fwiw
376 06:31 <luke-jr> not quite the same thing, but it's _part_ of the network that might be observable
377 06:32 <jonatack> luke-jr: nice. released june 14? any stats on number of nodes running that version?
378 06:32 <luke-jr> [04:23:42] <jonatack> ISTM if everything is cached, the records everywhere become a bit older (by 1 day) <-- isn't it 1 day *per hop*?
379 06:33 <luke-jr> jonatack: the release was June 16th, but based on June 14th PRs
380 06:33 <sipa> luke-jr: there are two mechanisms though; getaddr->addr, and normal addr gossipping
381 06:33 <luke-jr> I'm seeing only 24 nodes upgraded so far
382 06:33 <sipa> i don't think the second is affected by this PR but i haven't reviewed in detail
383 06:33 <luke-jr> sipa: ah
384 06:34 <sipa> and for the getaddr->addr mechanism there isn't really any concept of hops
385 06:34 <luke-jr> do we currently *use* getaddr as a client then?
386 06:35 <sipa> luke-jr: yes, under certain conditions, in response to version
387 06:35 <sipa> so at most once per connection
388 06:35 <sipa> (we also only respond once per connection iirc)
389 06:36 <luke-jr> looks like normally once at connection
390 06:36 <luke-jr> due to pfrom.nVersion >= CADDR_TIME_VERSION
391 06:36 <luke-jr> outbound connection*
392 06:37 <jonatack> CADDR always puts a smile on my face
394 06:38 <jonatack> seeing lisp in the codebase :)
395 06:38 <sipa> )))))))))))
396 06:42 âš¡ luke-jr watches a tumbleweed roll by
397 06:42 <jonatack> luke-jr: "like normally once at connection" you're referring to ProcessMessage or RelayAddress?
398 06:42 <luke-jr> just glancing at the conditions around connman->PushMessage(&pfrom, CNetMsgMaker(nSendVersion).Make(NetMsgType::GETADDR));
399 06:43 <jonatack> ah VERSION
400 06:43 <luke-jr> it seemed like it happens fairly normal circumstances
401 06:46 <jonatack> luke-jr: do you see any adverse effects from this PR?
402 06:46 <luke-jr> so far no
403 06:48 <jonatack> how did you decide to add it to knots?
404 06:48 <luke-jr> less than 50% of the network is running 0.19.x+, so it's not the end of the world if we discover something post-release either
405 06:48 <luke-jr> jonatack: Knots merge policy is very relaxed - if it won't clearly disrupt anything, it usually goes in
407 06:49 <jonatack> luke-jr: ok
409 06:50 <jonatack> luke-jr: you have your own proprietary stats iirc?
411 06:51 <jonatack> thakns
413 06:53 <brikk> I'm trying to wrap my head around the peer diversity: can I think of this as three nodes: nodes with an ipv4/ipv6 address that never change, nodes behind tor, nodes whose address change every 24 hours
414 06:53 <jonatack> seems quite different from the dsn one in germany which appears to show more 0.19.x nodes
415 06:53 <luke-jr> I guess a lot of 0.16.x nodes are firewalled or something
416 06:53 <luke-jr> jonatack: I include non-listening by default
417 06:53 <jonatack> got it
418 06:55 <jonatack> listening nodes seem to update more to the recent versions, which makes sense as they are presumably more sophisticated users
419 06:57 <jonatack> brikk: diversity by AS number, by IP, by geography
420 06:58 <jonatack> brikk: by uptime, recently seen
421 06:59 <jonatack> brikk: minimum ping time (i'm thinking about the criteria used in bitcoin core for potentially evicting peers)
422 07:01 <jonatack> brikk: recently sent us transactions or blocks
423 07:01 <jonatack> see CConnman::AttemptToEvictConnection in net.cpp
424 07:02 <brikk> jonatack: thanks!
426 07:03 <jonatack> #endmeeting
427 07:03 <jonatack> that's time! thanks brikk sipa luke-jr
428 07:05 <brikk> jonatack: thanks for hosting! I like this time slot better, although looking at the number of participants perhaps not so much?
429 07:05 <sipa> maybe it takes some time before people know it exists
430 07:06 <sipa> though i suspect it'll inevitably be less popular than the other slot
431 07:06 <jonatack> yes, my guess is it might be network/social effect, too... might just take time
432 07:07 <jonatack> kallewoof hosted a review club at this time slot a few months back which was well-attended
437 07:12 <luke-jr> kallewoof is in Japan, maybe he spread word in person
438 07:18 <jonatack> yes, there were the australians (aj, meshcollider, fanquake) and also murch
439 07:25 <meshcollider> An email chain was started for it last time
440 07:25 <meshcollider> That's how I found out about it
441 07:26 <jonatack> ah! -- good idea
442 07:28 âš¡ jonatack realises may have unwittingly described a new zealander as an australian
443 07:30 <sipa> you may have started world war 3
444 07:34 <brikk> it's 2020 all over again