Cache responses to GETADDR to prevent topology leaks (p2p)

https://github.com/bitcoin/bitcoin/pulls/18991

Host: naumenkogs  -  PR author: naumenkogs

The PR branch HEAD was 52d22a3c at the time of this review club meeting.

Notes

  • The motivation for PR 18991 is to make the ADDR/GETADDR address gossip protocol more private for the node providing addresses.

  • Nodes transmit three main objects over the p2p network: transactions, blocks and network addresses (e.g. 172.1.2.3). Today’s PR is concerned with network address gossiping.

  • There are 2 relevant p2p messages: GETADDR (the sender requests the receiver for network addresses along with some per-address metadata) and ADDR (the receiver responds with the requested data they want to share with the sender).

  • This GETADDR/ADDR protocol is currently used every time a node makes a new connection, after receiving the version message.

  • There are other ways of learning about potential peers, which are not relevant to this PR, but it’s worth being aware of them:

    • Network addresses of the nodes are propagated in the network in an unsolicited way: from time to time, every node makes a self-announcement which is propagated across the network.

    • Nodes can query predefined DNS seeds for a list of potential peers.

  • Every time a node hears about another node in the network, it adds/updates a record in its AddrMan (Address Manager). This is where all the data about nodes in the network is stored, and this is where a GETADDR receiver looks to construct an ADDR response to the requester.

  • It turns out that it’s possible for a spy node to easily scrape the full contents of any reachable node’s AddrMan. The spy just has to connect to a victim node multiple times and execute GETADDR. This scraped data can then be used to infer private information about the victim.

  • For example, a spy can monitor the victim’s AddrMan content in real time and figure out which peers a node is connected to. A spy can also compare the AddrMan content from two different connections (e.g. one identified by Tor address and one identified by IPv4) and figure out that it’s actually the same physical node.

  • This PR is a first step towards fixing these privacy issues. If we limit (cache) the leaked portion of AddrMan, these inference activities will become much harder. Caching in this context means that the ADDR response (which is only a small subset of a node’s AddrMan content) remains the same for every GETADDR call during (roughly) a day.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? You’re always encouraged to put your PR review on GitHub, even after it has been merged.

  2. What is the importance of ADDR relay and specifically the GETADDR/ADDR protocol? What are the goals of this protocol? Which properties of ADDR relay are important?

  3. Do you understand the examples of the attacks described above? Can you think of any other ways to exploit scraping AddrMan? If you find something really dangerous, consider discussing it first with another developer privately.

  4. Do you think the attacks are severe? What are the potential consequences of making the topology (local or global) public?

  5. Do you think the suggested PR sufficiently addresses the problem?

  6. What are the side effects of the suggested solution?

Meeting Log

  113:00 <jnewbery> #startmeeting
  213:00 <jnewbery> hi
  313:00 <emzy> hi
  413:00 <gleb1010> hi
  513:00 <troygiorshev> hi
  613:00 <willcl_ark> hi
  713:00 <pinheadmz> hi
  813:00 <figs> hi
  913:00 <amiti> hi
 1013:00 <jnewbery> Hi folks! Today we're going to be looking at a part of the p2p network that we haven't looked at before: address discovery and gossip.
 1113:01 <jnewbery> Or in other words: how a node learns about potential new peers to connect to.
 1213:01 <michaelfolkson> hi
 1313:01 <lightlike> hi
 1413:01 <jnewbery> Notes and questions in the normal place: https://bitcoincore.reviews/18991.html
 1513:01 <jnewbery> We're lucky to have the PR author hosting today. So without any more delay, I'll hand over to gleb1010.
 1613:01 <sipa> hi
 1713:01 <gleb1010> Hi again all! Hope we have a productive hour
 1813:01 <gleb1010> I saw some useful review comments from participants, but let’s check with everyone.
 1913:01 <gleb1010> Have you reviewed the PR? Y/n
 2013:01 <willcl_ark> y
 2113:01 <pinheadmz> y
 2213:01 <troygiorshev> y
 2313:01 <amiti> y
 2413:01 <lightlike> y
 2513:01 <jnewbery> y
 2613:02 <emzy> y/n
 2713:02 <gleb1010> That's impressive :)
 2813:02 <gleb1010> You are feel free to ask any questions at any time, but let’s start with the notes we posted on the website.
 2913:02 <gleb1010> Let's first discuss this
 3013:02 <gleb1010> What is the importance of ADDR relay and specifically the GETADDR/ADDR protocol?
 3113:03 <pinheadmz> gleb1010 its for network nodes to discover new peers
 3213:03 <emzy> Asking a peer for more other peers.
 3313:03 <andrewtoth> hi
 3413:03 <gleb1010> Right, so since Bitcoin operates over a peer-to-peer network, it's essential for everyone to know somebody else from the network
 3513:03 <pinheadmz> there are also DNS seeds, and I think a hard-coded list of peers (?) for bootstrapping
 3613:04 <nehan_> hi
 3713:04 <gleb1010> pinheadmz: right, there are several ways to learn about other nodes in the network
 3813:04 <emzy> even a DNS seed uses ADDR to find nodes.
 3913:05 <pinheadmz> emzy true but to get a list of those peers from the seeder its actually DNS protocol
 4013:05 <willcl_ark> hmmmm, just thinking out loud - does this PR affect DNS nodes' responses at all? (I've not looked at the dnsseeder code)
 4113:05 <gleb1010> DNS internals always felt a bit obfuscated to me, so if you're somewhat confused, you're not alone :)
 4213:06 <gleb1010> Nope, I don't think they affect those responses, but maybe sipa can confirm?
 4313:06 <michaelfolkson> How were those hard-coded peers chosen? Are they regularly checked for uptime etc?
 4413:06 <willcl_ark> seems like https://github.com/sipa/bitcoin-seeder doesn't query Core, so I guess not!
 4513:06 <gleb1010> Pieter runs one out of 6 DNS seeds we have, so that the very new nodes know how to learn about each other
 4613:06 <gleb1010> michaelfolkson: I believe Wladimir maintains the list.
 4713:06 <sipa> do does emzy
 4813:07 <gleb1010> Well, hard-coded peers are updated within regular release process.
 4913:07 <emzy> willcl_ark: it connects direcly via p2p to a node to ask. No local bitcoind is running.
 5013:07 <sipa> michaelfolkson: see https://github.com/bitcoin/bitcoin/blob/master/doc/release-process.md
 5113:07 <gleb1010> So I think if you look hard enough you can find a PR with that list
 5213:07 <willcl_ark> emzy: thanks,
 5313:07 <gleb1010> Okay this one is really interesting.
 5413:07 <gleb1010> Which properties of ADDR relay are important?
 5513:08 <gleb1010> We know that blocks should be relayed faster, we know that transactions should be more or less unlinkable to IP
 5613:08 <michaelfolkson> sipa: Thanks
 5713:08 <willcl_ark> Ideally, you want to get a diverse list of peers
 5813:08 <pinheadmz> and you want peers with good uptime or at least "seen" recently
 5913:08 <jnewbery> emzy: DNS seeds actually use DNS
 6013:08 <gleb1010> Right, one way to look at it is to consider an individual node's AddrMan, not cross-network relay.
 6113:09 <jnewbery> here: https://github.com/bitcoin/bitcoin/blob/ffa70801dab7fa85c24fd5d19ca998e0910238d5/src/net.cpp#L1681-L1682
 6213:09 <gleb1010> Every node wants to have a good up-to-date list of nodes in their AddrMan.
 6313:09 <jnewbery> if DNS fails, we connect using P2P and send a GETADDR (here: https://github.com/bitcoin/bitcoin/blob/ffa70801dab7fa85c24fd5d19ca998e0910238d5/src/net.cpp#L1691-L1694)
 6413:10 <emzy> jnewbery: it answers via DSN but it has to crawl the nodes via p2p
 6513:10 <gleb1010> This is where we ideally should talk about unsolicited self-announcements which propagate ADDRs across the network, but let's leave that for homework
 6613:10 <gleb1010> Let's see what's this parallel discussion is about :)
 6713:10 <lightlike> so this change might influence DNS seed indirectly. If the DNS seeder regularly scrapes to get new addresses (and deliver them via DNS) and it gets cached responses in the future, it might introduce some kind of inertia to changes in the network.
 6813:10 <emzy> s/DSN/DNS/
 6913:11 <gleb1010> emzy is curious how DNS servers actually learn about the nodes they gonna feed to nodes querying them
 7013:11 <gleb1010> or maybe not curious but he makes a statement :)
 7113:11 <sipa> emzy runs a DNS seed
 7213:11 <willcl_ark> lightlike: but the cached responses will be different from each node
 7313:11 <jnewbery> emzy: I'm not sure what you mean by crawl the nodes, but it gets its list of potential peers by DNS
 7413:12 <lightlike> willcl_ark: true, so probably not noticeably
 7513:12 <sipa> jnewbery: the DNS seeder itself, not the client
 7613:12 <sipa> jnewbery: it has to get the list of good IP addresses it serves from somewhere
 7713:12 <gleb1010> This is true. If everything is cached, the records everywhere become a bit older (within 1 day)
 7813:12 <sipa> to do so, it crawls the network using the P2P protocol
 7913:12 <pinheadmz> anyone can do a quick experiement to query the DNS seed nodes: `dig seed.bitcoin.sipa.be`
 8013:12 <pinheadmz> returns a list of A records
 8113:13 <sipa> you can get IPv6 ones using dig -t AAAA seed.bitcoin.sipa.be
 8213:13 <troygiorshev> pinheadmz: thx
 8313:13 <emzy> tnx sipa, that was my point.
 8413:13 <gleb1010> Okay, so we can talk about the impact a bit more now
 8513:13 <willcl_ark> so I think this doesn't affect the DNS seeds
 8613:13 <gleb1010> How we should look at the threat of everyone gets a bit older AddrMan records?
 8713:14 <gleb1010> Like, let's say a node gets 1000 records, but 500 of them were updated since the last cache
 8813:14 <michaelfolkson> That is a threat? Or just an inefficiency?
 8913:14 <troygiorshev> will this make it more difficult for a new node to get a diverse set of peers?
 9013:14 <willcl_ark> less than 24h should not be too much of a problem (unless node churn is very bad)
 9113:14 <gleb1010> Right, so I remember a research paper from Till Neudecker showing that churn in our network is really low
 9213:15 <jnewbery> ah, I understand. Thanks
 9313:15 <emzy> gleb1010: It will take longer for new nodes to be discoverd by others.
 9413:15 <jnewbery> yes. I thought emzy was talking about 'DNS seed connections' rather than 'DNS seeders'. All clear now!
 9513:15 <lightlike> troygiorshev: it will probably introduce a delay until you get new inbound peers.
 9613:15 <sipa> fwiw, my dns seeder software will generally visit every node it knows about in the network every few hours
 9713:15 <willcl_ark> https://dsn.tm.kit.edu/bitcoin/ these guys measure (some) churn
 9813:15 <sipa> so this will impact it, but not much i think, given the low churn
 9913:15 <gleb1010> That's a great link above, it has so many things, save and take a look later :)
10013:16 <gleb1010> sipa: So the point is, some of the records might have an older timestamps, but they will probably be still alive?
10113:16 <gleb1010> And they won't be garbage in our AddrMan
10213:17 <gleb1010> Okay, let's move forward.
10313:17 <pinheadmz> gleb1010 are the timestamps in addrman updated with every message from the peer? or just once on connection?
10413:18 <gleb1010> pinheadmz: it's a bit complicated there, afaik it's different for inbound/outbounds
10513:18 <gleb1010> For one of them it's every message, for another it's when they connect. We also update based on unsolicited ADDRs...
10613:19 <gleb1010> But yeah, the expectations of these timestamps are a bit vague right now, probably that's an area of future improvements.
10713:19 <emzy> If an attacker has like 1000 ipv4 addresses this PR would not mitigate the attack?
10813:20 <gleb1010> There is no *the attack* :P
10913:20 <amiti> pinheadmz: I found the coinscope paper interesting bc it used the timestamps to figure out network topology (link: https://www.cs.umd.edu/projects/coinscope/coinscope.pdf)
11013:20 <gleb1010> Yeah coinscope is great!
11113:20 <troygiorshev> pinheadmz: fwiw if you feel like looking into the history, it was apparently changed in version 0.10.1
11213:20 <gleb1010> Let's get to attacks.
11313:20 <willcl_ark> What I am interested to know is, how does the previous (current) ADDR response give away your connected vs disconnected peers? Unless disclosing is a security risk OFC
11413:20 <amiti> idk how much the code has changed since then, but gave me an idea of the complexity
11513:20 <gleb1010> In the PR, I was a bit vague about the attacks, because it's always a bit sensitive.
11613:21 <emzy> gleb1010: or privacy leak
11713:21 <willcl_ark> Also an acceptable answer :)
11813:21 <gleb1010> But the coinscope paper is one example of exploiting it to infer direct links between nodes.
11913:21 <gleb1010> Another leak is to map Tor identity to ipv4 identity of the same node. If we scrape everything and compare all timestamps, it's very easy to tell it's the same entity.
12013:21 <pinheadmz> we've talked about eclipse attacks in review club before as well - it could be used by an atacker to check on their success maybe
12113:22 <gleb1010> Anybody got some time to think of any other vectors?
12213:22 <michaelfolkson> You mentioned linking Tor and IP addresses
12313:22 <amiti> I was wondering how a spy could go from knowing addrman to figuring out the network topology, the coinscope paper uses the timestamps. is that the main way or are there others?
12413:22 <gleb1010> pinheadmz: right, it's great to remember that most of researchers seem to feel that topology should be private/obscured for security reasons :)
12513:22 <michaelfolkson> And then there's linking Bitcoin nodes to Lightning nodes ;)
12613:23 <gleb1010> amiti: Great question! Timestamps is what I was always thinking too.
12713:23 <willcl_ark> So from that paper it seems "recent and unique" timestamps (might) give it away
12813:23 <gleb1010> willcl_ark can you elaborate?
12913:23 <sipa> i would expect there are ways to introduce randomly generated IPs in the network, broadcast them on certain incoming connections, and then observing which other peers know about them
13013:24 <willcl_ark> I'm not sure, but perhaps if we hear of a peer from another peer, we have the same timestamp, but if we connect ourselves, we get a unique timestamp
13113:24 <willcl_ark> so if you see the same timestamp from a few peers we can't tell, but you give me a unique and recent timestamp, we can infer you are connected directly to them
13213:24 <troygiorshev> sipa: ah like putting rubber ducks in a river and seeing where they end up
13313:24 <gleb1010> My first idea was to just track the "self-announcements"
13413:25 <gleb1010> Nodes sometimes announce themselves to their direct peers
13513:25 <sipa> michaelfolkson: lightning node have an explicit public identity, so i'm not sure what there is to link
13613:25 <gleb1010> And those timestamps will be very distinct
13713:25 <gleb1010> sipa: link lightning node to a bitcoin ip or onion leads to issues, see our time-dilation attacks paper :)
13813:25 <michaelfolkson> But you don't always know the Bitcoin full node that the Lightning node is using
13913:26 <sipa> ok
14013:26 <gleb1010> Alright, there are couple ideas to exploit this above
14113:26 <gleb1010> Perhaps someone comes up with something even cooler than we can think of
14213:26 <gleb1010> And maybe not much fixed by caching :)
14313:26 <pinheadmz> gleb1010 this cache applies to all your peers?
14413:26 <pinheadmz> so within 24 hours, every node that GETADDR from the same network get the same response?
14513:26 <sipa> oh no, a netsplit
14613:27 <pinheadmz> otherwise you could sybil a node and just get all the addrs anyway
14713:27 <gleb1010> pinheadmz: yes, same cache per the network of request originator
14813:27 <willcl_ark> i think the cache would be different for each peer, like the mempool
14913:27 <gleb1010> Except white-listed requestor
15013:27 <gleb1010> sipa: huh?
15113:27 <sipa> gleb1010: on IRC, now, in this channel
15213:28 <sipa> we lost jnewbery and others
15313:28 <gleb1010> ah i see :)
15413:28 <gleb1010> this happens from time to time
15513:28 <gleb1010> My handbook doesn't have instructions for this...
15613:28 <michaelfolkson> A network partition
15713:28 <sipa> haha
15813:28 <gleb1010> I think that subnet is in good hands of jnewbery
15913:28 <willcl_ark> hard forked them off
16013:28 <gleb1010> okay so the discussion above was
16113:29 <gleb1010> I suggest using a separate cache for different network origin, and someone above pointed out why it is useful
16213:29 <gleb1010> Welcome back!
16313:29 <gleb1010> Alright, let's try to merge back to the notes
16413:30 <michaelfolkson> In the time you were away we eclipsed all your peers
16513:30 <gleb1010> We sort of discussed the severity of the attacks
16613:30 <gleb1010> Leaking the topology can eclipsing a node easier, also spying easier
16713:30 <gleb1010> Also stealing funds from Lightning... many things
16813:30 <gleb1010> Anything else?
16913:31 <michaelfolkson> Privacy attacks
17013:31 <willcl_ark> elcipse of lightning light clients (e.g. Neutrino) seems like a particularly severe one
17113:31 <gleb1010> Well, part of the issue is that Neutrino's p2p stack is poor, so I'm not sure it's a good motivation for a Bitcoin Core change haha
17213:31 <willcl_ark> but i guess it all comes back to the same base-layer eclipse
17313:31 <willcl_ark> sure!
17413:32 <gleb1010> Anyone looking for some little work should consider helping to mature Neutrino implementation
17513:32 <gleb1010> With the experience we're getting while working on this stuff :P
17613:33 <gleb1010> Yeah, so basically eclipse attacks are the worst probably (and netsplits, which is sort of an eclipse too but large scale)...
17713:33 <gleb1010> You can do what you want with your victim, so we don't want that.
17813:33 <gleb1010> Bringing us to the question... does this PR really help?
17913:33 <pinheadmz> i guess if you got enough addrs from enough nodes you could uncover who the critical nodes are
18013:33 <gleb1010> (to hide local and global topologies)
18113:33 <pinheadmz> like, nodes with the most connections, potential attack points
18213:34 <gleb1010> pinheadmz: Right, it would be easier to split the net if one kills the bridges, but hopefully our graph is random enough so that's not very helpful :)
18313:34 <amiti> does netsplit = network partition?
18413:34 <primal> not sure if it came across but I had the following comment during the netsplit
18513:34 <primal> is the PR not shifting the attack strategy to gain topology information from an attacker being rate limited to the attacker spinning up a botnet?
18613:34 <primal> I haven't worked through how peering dynamics and rate limiting ADDR will evolve, but ^^ is a gut-reaction check
18713:34 <gleb1010> amiti: yeah
18813:34 <willcl_ark> it appears that it might slow down the attack to the extent that peers get new connections etc. so that wouldn't be so effective
18913:34 <nehan_> gleb1010: it slows the attacker down
19013:34 <gleb1010> Right. This PR just makes an attacker spend way more time on learning what they want to learn.
19113:34 <willcl_ark> at pretty minimal cost to "freshness" of ADDR responses
19213:35 <troygiorshev> I'm worried that it simply delays learning the topology, but i'm not sure
19313:35 <gleb1010> I never bothered to measure the exact attack delay we may introduce
19413:35 <lightlike> primal: you can connect with as many bots that you want, all will get the same cached response for a while.
19513:35 <gleb1010> Because I have other ideas on the way to improve the privacy :)
19613:35 <gleb1010> Let's think of how we can improve this stuff even further?
19713:35 <pinheadmz> is there a drawback where a new node with maxinbound 1000 wont actually get any new inbounds for up to 24 hours?
19813:36 <primal> lightlike ahh ok so the cache reduces the set of info that we allow to leak out of our node
19913:36 <gleb1010> Anything crazy creative works, let's discuss ideas
20013:36 <michaelfolkson> Sometimes delays are all you can do troygiorshev. Make it unviable to do unless extremely targeted victim
20113:36 <gleb1010> primal! Right. And also the indicators=timestamps are a little "outdated"
20213:36 <willcl_ark> primal: I guess it's like "leak rate"
20313:36 <gleb1010> pinheadmz: not sure I follow.
20413:36 <primal> "topology disclosure rate" or something of the sort
20513:37 <gleb1010> What this has to do with inbound limit?
20613:37 <jnewbery> pinheadmz: it'll reduce how quickly other nodes learn about your address in the first 24 hours, but won't eliminate it entirely
20713:37 <emzy> gleb1010: what about randomness in the last seen time?
20813:37 <pinheadmz> jnewbery right, just a delay. and for inbounds its like "eh, your loss"
20913:37 <jnewbery> because each node's cache is updated at a different time
21013:37 <willcl_ark> pinheadmz: you should get 1000 different cached responses from all of those peers, so you should be fine
21113:37 <primal> emzy: you don't want randomness, you want to chop the information at a certain bit
21213:37 <gleb1010> emzy: this is interesting!
21313:37 <jnewbery> also, be aware that there is another method of address gossipping, which is that each node will announce its own address to its peers every ~24 hours, and those peers will gossip that on to some of their peers. That method is unaffected by this PR.
21413:37 <nehan_> the thing i'm trying to understand are the implications of serving stale timestamps. the timestamp logic is weird (as described in the coinscope paper, which might be out of date)
21513:37 <thomasb06> is there a reverse mechanism in case of misusage: certain nodes able to overcome the privacy for example?
21613:37 <gleb1010> wow this moves fast!
21713:37 <emzy> primal: also vaid
21813:37 <pinheadmz> willcl_ark i referring more to my own IP getting out to the network, if i have a lot of open slots to offer
21913:37 <jnewbery> that there is another method of address gossipping, which is that each node will announce its own address to its peers every ~24 hours, and those peers will gossip that on to some of their peers. That method is unaffected by this PR.
22013:38 <jnewbery> (https://github.com/bitcoin/bitcoin/blob/ffa70801dab7fa85c24fd5d19ca998e0910238d5/src/net_processing.cpp#L3896-L3899)
22113:38 <jnewbery> (or more accurately: for each peer, once every 24 hours, we reannounce our address to that peer)
22213:38 <gleb1010> jnewbery: I believe it's a bit less often than that, because there's a bloom filter in that announcement, but I have to double-check...
22313:38 <willcl_ark> oh no! we lost him
22413:38 ⚡ sipa spawns a new gleb1010
22513:39 <willcl_ark> :)
22613:39 <pinheadmz> gleb1011 enters :-)
22713:39 <michaelfolkson> I do wonder how easy/difficult it is to knock a node off the Bitcoin network
22813:39 <gleb10101> Now I got forked, sorry.
22913:39 <michaelfolkson> Because that's what eclipse attacks rely on right?
23013:39 <gleb10101> michaelfolkson: good question! There are many ways to knock a node
23113:39 <emzy> is there a problem for nodes that change evey 24h there ip address to be discoverd? Beause this is often the case by DSL/dial in connections.
23213:40 <gleb10101> michaelfolkson: hopefully all those many ways are hard/expensive enough :)
23313:40 <pinheadmz> jnewbery dont wanna veer too far off topic, but doesnt a node learn its own IP address from other peers?
23413:40 <gleb10101> emzy: I actually didn't think about them. What would be the issue? They get less connections inbound?
23513:41 <jnewbery> pinheadmz: yes, I believe that's a part of it, although it's not something I've looked at too closely
23613:41 <pinheadmz> emzy this is whayt i was getting to. but actually most home-run nodes probably arent accepting inbounds anyway (firewall, etc)
23713:41 <emzy> gleb10101: exactly. They will be second class nodes.
23813:41 <primal> pinheadmz I don't understand the significance of a node learning it's own ip add from other peers. what am I missing?
23913:41 <amiti> pinheadmz, jnewbery: oh I've totally seen code in connection logic that says "if its yourself, disconnect" 😛
24013:42 <pinheadmz> primal bitcoind used to actually make a request to whats-my-ip.com or something
24113:42 <gleb10101> emzy: fair enough, maybe you should think about that more and then come to the PR with the conclusions
24213:42 <pinheadmz> i guess its hard for a process on your laptop to know what its internet IP truly is
24313:42 <jnewbery> amiti: that's something slightly different, and is detected by including a random nonce in the VERSION message
24413:42 <emzy> pinheadmz: I think this is the case.
24513:42 <gleb10101> emzy: wondering if that's the scenario we're targeting
24613:42 <gleb10101> Alright, I asked about the alternative solutions, I was about to share one :)
24713:42 <emzy> gleb10101: good question.
24813:42 <gleb10101> Just to throw this another idea, we don't have to discuss it.
24913:43 <gleb10101> I wanna implement self-announcement on feeler connection with some probability.
25013:43 <amiti> jnewbery: oh, interesting. ok thanks
25113:43 <primal> pinheadmz are you saying that bc a node can learn its ip addr from peers that removes the need to communicate with other services?
25213:43 <gleb10101> We do connect to some node in the network every 2 minutes just to see if they're alive, might as well ask them to relay our addr. This would obfuscate it even further
25313:43 <pinheadmz> primal right or more sepciifcally centralized services that arent even bitcoin related...ill try to find the PR its an old one
25413:43 <gleb10101> Maybe someone gets any ideas like this for future PR :)
25513:44 <sipa> bitcoin core used to query some "find my ip" website in a long-gone past
25613:44 <jnewbery> gleb10101: what do you mean by 'obfuscate' in this context?
25713:44 <willcl_ark> sipa: sounds like a privacy nightmare :P
25813:44 <sipa> willcl_ark: yes indeed
25913:44 <gleb10101> jnewbery: Coinsope and my own idea initially was to 1. scrape AddrMan often 2. infer inbounds by new records/special timestamps
26013:45 <pinheadmz> sipa right, and then that website turned into a real estate site or something, was bascially getting ddosed by bitcoin network :-P
26113:45 <gleb10101> jnewbery: so now these new records/special timestamps will be not only at victim's direct peers and their peers, but also at random feelers
26213:45 <gleb10101> Yeah, this is an interesting story how we moved from that website...
26313:45 <willcl_ark> could nodes use a random offset for the timestamp when serving (or storing) ADDR
26413:46 <willcl_ark> then nobody would ever have the same
26513:46 <gleb10101> willcl_ark: this is what greg maxwell told me we already do when I discovered this issue a year ago haha
26613:46 <gleb10101> But then I don't think we do randomize them
26713:46 <gleb10101> So that was some phantom feature
26813:46 <willcl_ark> :) But we should!
26913:47 <gleb10101> So the idea is to randomize a timestamp on every ADDR sending
27013:47 <gleb10101> This will help with some issues...
27113:47 <pinheadmz> https://github.com/bitcoin/bitcoin/pull/3088 dont use 3rd party IP services
27213:47 <gleb10101> willcl_ark tracking occurence of new records in AddrMan still would be possible
27313:47 <emzy> the randomness may be the less invasive for the P2P network.
27413:48 <amiti> fundamental question: what is the intended purpose of the ADDR timestamps? I saw logic that used this info to not relay old addrs. is that the main reason?
27513:48 <gleb10101> amiti: yeah I believe so.
27613:48 <primal> pinheadmz 845c86d128fb97d55d125e63653def38729bd2ed
27713:49 <willcl_ark> gleb10101: hmmmm, interesting
27813:49 <gleb10101> I believe every time we get an ADDR, we would deprioritize it if it's 1 week old
27913:49 <primal> ah yeah you linked it
28013:49 <gleb10101> Okay, we have 10 minutes left
28113:49 <gleb10101> I was about to ask about side-effects, but we actually discussed them :)
28213:49 <gleb10101> But someone can highlight a side-effect of their concern again
28313:50 <gleb10101> Or just ask any other question?
28413:50 <willcl_ark> I was wondering about setting ADDR as a default for whitebind
28513:50 <jnewbery> There was a suggestion in https://github.com/bitcoin/bitcoin/pull/16442 to dynamically change the local service bits depending on whether the compact block filter index was built. It was argued that because it would only ever go from false to true, that would be ok.
28613:50 <pinheadmz> is there any way an ADDR message cahced response could be used to identify a node? if you get the same response from nodes running on two IPs for example?
28713:50 <willcl_ark> it doesn't really make much difference as you specify in config, but...
28813:50 <nehan_> I asked earlier about the implications of serving stale timestamps but that might have gotten lost in the fork and/or I didn't see the answer
28913:51 <jnewbery> I think if nodes start randomizing timestamps that's no longer true. You could get an old address record with a newer timestamp than the (actually) newer address record
29013:51 <gleb10101> nehan_: We don't want to spend days digging into outdated nodes and finding a live one...
29113:51 <nehan_> or think that nodes are stale that aren't actually stale
29213:51 <gleb10101> And we don't want to spend bandwidth relaying old non-live nodes
29313:52 <michaelfolkson> I was listening to ariard on TFTC and he was saying it needs a similar level of resource to secure P2P for Neutrino or Lightning as it does secure P2P on Core. Had never thought of it like that before
29413:52 <nehan_> with this change, where a node might have served a fresh timestamp it would now serve one that was 27 hours old
29513:52 <gleb10101> pinheadmz: yeah, but I don't know how to address this issue :)
29613:52 <michaelfolkson> I just assumed Core would take the brunt of addressing a lot of the P2P attacks on Neutrino/Lightning indirectly
29713:52 <gleb10101> nehan_: true, but in the beginning of the meeting we sort of considered that 1-day old is probably fine.
29813:53 <gleb10101> We should be talking about at least several-days lag for it to be bad. Although it's a bit arbitrary and depends on many things. It's more of an intuition
29913:54 <nehan_> gleb10101: doesn't that sort of imply we don't need to update timestamps frequently?
30013:54 <sipa> jnewbery: i feel that with feelers this is less of an issue, as they will always overwrite the flags data with the actual flags
30113:55 <sipa> (i need to check if feelers actually override flags)
30213:55 <amiti> re timestamps and addr relay: I still don't understand how its really helping. as a recipient you are able to assign likelihood-of-node-being-live if the sender is being honest in the reported timestamps. if thats the case, why not just have honest nodes proactively only send addrs of recently-tested-conns?
30313:55 <gleb10101> nehan_: define frequently :) Records should be at most couple days old. We currently don't need better freshness.
30413:55 <gleb10101> We don't know how to distinguish 3 days old from 1 day old. The code doesn't.
30513:56 <willcl_ark> presumeably you could just also make the timstamp up, if you were so inclined
30613:56 <gleb10101> I mean, we can distinguish, but we don't do anything with it, sorry.
30713:57 <gleb10101> amiti: Right, maliciously updating timestamps attack is one of my todos :)
30813:57 <gleb10101> That's also why any fine-grained optimization of being alive is dangerous.
30913:57 <gleb10101> It's free to bump for an attacker to bump their timestamp
31013:58 <pinheadmz> is there any banscore type thing if a node sends us 1000 ADDRS and none of them work ?!
31113:58 <gleb10101> Meaning we don't want to rely on timestamps too much...
31213:58 <amiti> gleb: huh? like explore the feasibility of attack?
31313:58 <jnewbery> sipa: I think we do. We call SetServices() when we receive the version, and then disconnect
31413:58 <sipa> pinheadmz: we won't know they don't work until days, maybe weeks later
31513:58 <gleb10101> pinheadmz: nope. I mean, we don;t want checking 1000 nodes at once :)
31613:59 <gleb10101> amiti: we're out of time it seems, hit me up later :)
31713:59 <pinheadmz> sipa gleb10101 right, and no real pain y trying to connect to bad nodes
31813:59 <pinheadmz> well great work on this gleb10101 very simple to understand and makes a lot of sense
31913:59 <troygiorshev> yeah thanks gleb10101!
32014:00 <gleb10101> Thank you! For those haven't look at the code it's actually few lines so please review :P
32114:00 <willcl_ark> thanks gleb10101
32214:00 <emzy> thanks gleb10101!
32314:00 <andrewtoth> thanks gleb10101!
32414:00 <lightlike> thanks!
32514:00 <jnewbery> pinheadmz: and that's a more general problem. How do we 'punish' a node for giving us bad data? Looking at orphan processing and mapBlockSource is left as an exercise for the reader :)
32614:01 <gleb10101> #endmeeting
32714:01 <primal> thanks gleb10101

Meeting Log – Asia time zone

Host: jonatack

32805:04 <jonatack> If anyone is around, we'll get started in just under an hour.
32906:00 <jonatack> #startmeeting
33006:00 <jonatack> hi
33106:01 ⚡ jonatack things seem quiet
33206:01 <sipa> vaguely here
33306:02 <brikk> hi
33406:02 <jonatack> hi!
33506:03 <jonatack> So I spent some time going through the meeting log and the PR.
33606:03 <jonatack> brikk: did you get a chance to review the PR?
33706:04 <brikk> jonatack: unfortunately not, I was not aware of this meeting only saw activity now at a convenient time right at the start of my day :)
33806:04 <jonatack> that's all good
33906:04 <jonatack> this is about https://bitcoincore.reviews/18991
34006:05 <jonatack> "Cache responses to GETADDR to prevent topology leaks (p2p)"
34106:05 <brikk> thanks, I'm looking at it now
34206:05 <jonatack> to make the ADDR/GETADDR address gossip protocol more private for the node providing addresses
34306:06 <jonatack> ADDR relay and specifically the GETADDR/ADDR protocol exist for peer discovery in bitcoin's p2p network.
34406:07 <jonatack> in addition to two other ways: DNS seeders which crawl the network using the p2p protocol, and hardcoded fixed seeds
34506:07 <jonatack> for instance, one can run dig seed.bitcoin.sipa.be (or dig -t AAAA seed.bitcoin.sipa.be for IPv6) on the command line to see a list of A/AAAA records that seeders might provide
34606:09 <jonatack> Essentially, the goal of this PR is to slow attackers down, to make them spend more time to learn local or global peer topology.
34706:11 <jonatack> ADDR relay has various properties of importance. This PR seems to shift the priority among them a bit in the hope of a better tradeoff.
34806:11 <jonatack> What are these properties?
34906:11 <jonatack> - Privacy: hiding local and global topology, difficulty of identifying peers, or of linking transactions to IP addresses
35006:11 <jonatack> - Peer diversity
35106:12 <jonatack> - Decentralisation: Trust reduction with respect to the DNS seeds and the fixed seeds
35206:12 <jonatack> - Speed of relay
35306:12 <jonatack> - Freshness: peers having an up to date list of peers
35406:13 <jonatack> - Quality: peers who are well-behaved, seen recently, with good uptime
35506:14 <brikk> What does peer diversity mean?
35606:14 <jonatack> For example, a tradeoff this PR would seem to be proposing is less freshness (if that is the best word; it may not be) in favor of privacy
35706:14 <jonatack> or less diversity as well, possibly
35806:15 <brikk> Speed of relay seems to be a tradeoff as well, right?
35906:16 <jonatack> i'm not sure
36006:16 <jonatack> how do you see it?
36106:17 <brikk> just by looking at the comments in the review
36206:17 <brikk> there's a lot of comments though and I am yet to make it til the end, so perhaps my perception will change :)
36306:19 <jonatack> diversity: good question. for example, by ASN, which was a motivation for the -asmap p2p addition in the latest release of bitcoin core.
36406:21 <jonatack> https://bitcoincore.reviews/16702 covered asmap and contains good resources on the various attacks: erebus, eclipse, bgp hijacking
36506:22 <jonatack> sipa: do you think today's PR could adversely affect discovery of newly online peers, or ones who change IP address frequently?
36606:23 <jonatack> ISTM if everything is cached, the records everywhere become a bit older (by 1 day)
36706:23 <sipa> i'd need to think more about that
36806:23 <brikk> right, sounds like the things amiti uttarwar was talking about in the reckless vr meetup
36906:23 <jonatack> and new node discovery might be slower... but i would need to look at it more.
37006:26 <brikk> jonatack: when you say new node discovery, does that mean that I bring a new node to the discovery and it would mean issues for me, or that someone else brings a new node online and the rest of the network has trouble discovering it?
37106:27 <jonatack> Both? This is an aspect I'm not sure on.
37206:27 <brikk> ok
37306:30 <jonatack> The hard thing with p2p changes like this, to my mind, is how to simulate the effects before actually deploying on the network.
37406:31 <luke-jr> it's in Knots 0.20.0 fwiw
37506:31 <brikk> I agree
37606:31 <luke-jr> not quite the same thing, but it's _part_ of the network that might be observable
37706:32 <jonatack> luke-jr: nice. released june 14? any stats on number of nodes running that version?
37806:32 <luke-jr> [04:23:42] <jonatack> ISTM if everything is cached, the records everywhere become a bit older (by 1 day) <-- isn't it 1 day *per hop*?
37906:33 <luke-jr> jonatack: the release was June 16th, but based on June 14th PRs
38006:33 <sipa> luke-jr: there are two mechanisms though; getaddr->addr, and normal addr gossipping
38106:33 <luke-jr> I'm seeing only 24 nodes upgraded so far
38206:33 <sipa> i don't think the second is affected by this PR but i haven't reviewed in detail
38306:33 <luke-jr> sipa: ah
38406:34 <sipa> and for the getaddr->addr mechanism there isn't really any concept of hops
38506:34 <luke-jr> do we currently *use* getaddr as a client then?
38606:35 <sipa> luke-jr: yes, under certain conditions, in response to version
38706:35 <sipa> so at most once per connection
38806:35 <sipa> (we also only respond once per connection iirc)
38906:36 <luke-jr> looks like normally once at connection
39006:36 <luke-jr> due to pfrom.nVersion >= CADDR_TIME_VERSION
39106:36 <luke-jr> outbound connection*
39206:37 <jonatack> CADDR always puts a smile on my face
39306:38 <luke-jr> ?
39406:38 <jonatack> seeing lisp in the codebase :)
39506:38 <sipa> )))))))))))
39606:42 ⚡ luke-jr watches a tumbleweed roll by
39706:42 <jonatack> luke-jr: "like normally once at connection" you're referring to ProcessMessage or RelayAddress?
39806:42 <luke-jr> just glancing at the conditions around connman->PushMessage(&pfrom, CNetMsgMaker(nSendVersion).Make(NetMsgType::GETADDR));
39906:43 <jonatack> ah VERSION
40006:43 <luke-jr> it seemed like it happens fairly normal circumstances
40106:46 <jonatack> luke-jr: do you see any adverse effects from this PR?
40206:46 <luke-jr> so far no
40306:48 <jonatack> how did you decide to add it to knots?
40406:48 <luke-jr> less than 50% of the network is running 0.19.x+, so it's not the end of the world if we discover something post-release either
40506:48 <luke-jr> jonatack: Knots merge policy is very relaxed - if it won't clearly disrupt anything, it usually goes in
40606:49 ⚡ jonatack looks at https://dsn.tm.kit.edu/bitcoin to see
40706:49 <jonatack> luke-jr: ok
40806:50 <jonatack> https://dsn.tm.kit.edu/bitcoin/#useragents
40906:50 <jonatack> luke-jr: you have your own proprietary stats iirc?
41006:51 <luke-jr> yes http://luke.dashjr.org/programs/bitcoin/files/charts/branches.html
41106:51 <jonatack> thakns
41206:52 <luke-jr> I suppose http://luke.dashjr.org/programs/bitcoin/files/charts/branches.html?onlylistening=1 shows 55% running recent versions
41306:53 <brikk> I'm trying to wrap my head around the peer diversity: can I think of this as three nodes: nodes with an ipv4/ipv6 address that never change, nodes behind tor, nodes whose address change every 24 hours
41406:53 <jonatack> seems quite different from the dsn one in germany which appears to show more 0.19.x nodes
41506:53 <luke-jr> I guess a lot of 0.16.x nodes are firewalled or something
41606:53 <luke-jr> jonatack: I include non-listening by default
41706:53 <jonatack> got it
41806:55 <jonatack> listening nodes seem to update more to the recent versions, which makes sense as they are presumably more sophisticated users
41906:57 <jonatack> brikk: diversity by AS number, by IP, by geography
42006:58 <jonatack> brikk: by uptime, recently seen
42106:59 <jonatack> brikk: minimum ping time (i'm thinking about the criteria used in bitcoin core for potentially evicting peers)
42207:01 <jonatack> brikk: recently sent us transactions or blocks
42307:01 <jonatack> see CConnman::AttemptToEvictConnection in net.cpp
42407:02 <brikk> jonatack: thanks!
42507:03 <jonatack> brikk: we recently did a review club on that: https://bitcoincore.reviews/16756
42607:03 <jonatack> #endmeeting
42707:03 <jonatack> that's time! thanks brikk sipa luke-jr
42807:05 <brikk> jonatack: thanks for hosting! I like this time slot better, although looking at the number of participants perhaps not so much?
42907:05 <sipa> maybe it takes some time before people know it exists
43007:06 <sipa> though i suspect it'll inevitably be less popular than the other slot
43107:06 <jonatack> yes, my guess is it might be network/social effect, too... might just take time
43207:07 <jonatack> kallewoof hosted a review club at this time slot a few months back which was well-attended
43307:09 <jonatack> this one https://bitcoincore.reviews/17824#meeting-log--asia-time-zone
43407:09 <jonatack> and even more, this one: https://bitcoincore.reviews/17824#meeting-log--asia-time-zone
43507:10 <jonatack> sec
43607:11 <jonatack> *this one* https://bitcoincore.reviews/16981#meeting-log--asia-time-zone
43707:12 <luke-jr> kallewoof is in Japan, maybe he spread word in person
43807:18 <jonatack> yes, there were the australians (aj, meshcollider, fanquake) and also murch
43907:25 <meshcollider> An email chain was started for it last time
44007:25 <meshcollider> That's how I found out about it
44107:26 <jonatack> ah! -- good idea
44207:28 ⚡ jonatack realises may have unwittingly described a new zealander as an australian
44307:30 <sipa> you may have started world war 3
44407:34 <brikk> it's 2020 all over again