The PR branch HEAD was 25e674e at the time of this review club meeting.
Notes
The Bitcoin network uses addr messages to communicate addresses (the
locations of nodes). To read more about the P2P message, you can see these
docs.
Sometimes, we send our own address to our peers so it can be
propagated to new nodes on the network. It can be hard to get new peers if
nobody knows about you :)
Keep in mind that connections are asymmetric. We distinguish between inbound
and outbound peers based on who initiated the connection. Think about what
that entails for address discovery. If Node A initiates a connection to Node
B, do they both necessarily know each others addresses?
There are two code paths for checking & potentially relaying our address:
When we are
processing
the version message from a new connection,
When we are starting the
SendMessages
loop for a peer.
Prior to this PR, these two code paths had similar but slightly different
logic for advertising their addresses. This PR unifies the logical flows,
which also entails some slight behavioral changes. Reviewing this PR requires
understanding those differences to ensure they are safe. In some cases, we
might have multiple addresses we can use to refer to ourselves. We’ll dig
into why that is during review club!
Questions
Why do we advertise our own address? Is this necessary? When?
How often do we advertise our address? Why is this on a Poisson timer? Does
this frequency make sense?
Why might we have multiple addresses for ourselves? Where do they come from?
What are the behavioral changes in this PR?
When you PushAddress, will this necessarily send an addr P2P message?
When you relay an addr out to a peer, what does the receiving peer do with
the message? What would you anticipate occurring on the network as a whole?
<amiti> propagating node addresses is crucial for nodes find one another to open connections. I find this a really interesting area of the p2p network, and one that I’m just starting to learn about. So just want to be clear that I’m not an expert & I might not have all the answers, but I’ve been digging in and I’m excited to have this sesh to learn together!
<amiti> yeah, I believe `fListen` indicates if the node should accept connections from the outside, and is checked when we are deciding whether to attempt self advertisement
<dhruvm> Murch: yes. with -connect, without -listen, there's no point to growing addrman for ourselves but we still make getaddr calls to the -connect peer
<lightlike> I don't understand completely why this way of advertising was chosen, considering that our peer knows our address already - wouldn't it work as well if a given node regularly sends the address of some of its peers to other peers?
<robot-dreams> Related to lightlike's question: do we include our own address in the `version` message when initiating an outbound connection? Alternatively, is that address available to the receiver due to some lower-level network details (e.g. in the TCP header, which I don't know the details of)?
<@jnewbery> dhruvm: that 23% is specifically for getaddr responses. The other way of sharing addresses is through gossiping them periodically, which isn't affected by the 23% constant
<amiti> ok, so there's two concepts being discussed here: 1. do outbound connections know our address? 2. what is the difference between GETADDR and other ADDR relay?
<amiti> lets start with #2, because its essentially been answered. can someone summarize the difference between these two methods for sharing addresses?
<dhruvm> @jnewbery: sipa: Does that mean addr relay is unaffected by the 23% constraint? But getaddr is. If so, out node must initiate the addr, else there's a small chance it's never talked about.
<amiti> sipa: wait, what? so you're saying the version message has fields for "my address" and "your address", but the first is a dummy, so really we're just sending over "your address"?
<lightlike> but no matter what we put in the version message - shouldn't our outbound peer have some idea of what our address is, considering it communicates with us via p2p?
<robot-dreams> amiti: if we receive a `getaddr`, we respond immediately with an `addr`; there's also a separate loop that sends an `addr` on average every 30 seconds
<willcl_ark> Each bitcoind (presumably) just sees "new connection from (local) port xxxx". It's only "sure" of addresses for outbound connections it makes?
<amiti> but lets keep moving forward. the aspect I wanted to highlight was whether or not advertising our own address is actually necessary. since the peer must have some info about our address whether they are outbound or inbound, seems like there's a possibility of an alternate announcement method. I've thought about tradeoffs of each, maybe we can loop back to this question and dig in more if we have time
<robot-dreams> sipa: If the `addr_from` field of version isn't used, how do we discover "peer's view of our address, which might be more useful than our own view"?
<sipa> robot-dreams: eh, terminology is confusing... the sender of the version message does not report their address, but does include the address they are using for the peer... so the receiver of the version message learns how they were reached
<ares_> if we use the information of a received version message as our own IP, doesn't that make us susceptible to attacks if a peer deliberately announces a wrong IP?
<lightlike> Murch: but we advertise on the beginning of each connection (not only after 24 hours). That's why m_next_local_addr_send needs to be initialized to 0.
<amiti> oh right, I want to clarify there are two methods that trigger us to self advertise: 1. when we are processing a VERSION message a peer sent (so right after opening a new connection) 2. on an ongoing basis, ~1 / day
<amiti> willcl_ark, murch: yeah, I think generally poisson is used for privacy benefits, but I was wondering what is trying to be kept private here. we're literally announcing our address which is public information
<ares_> robot-dreams: just looking at the function for the first time. not sure which of those checks corresponds to what sipa was saying (i.e., comparing to a list of known IPs)
<@jnewbery> I think that the default distribution for any randomized timer that results in network messages should be poisson, and we should only use a non-poisson distribution if there's a good reason to.
<robot-dreams> amiti: Would it make sense to apply Poisson timing to "send out `addr`" but NOT apply Poisson timing "add our own address to the next `addr` to be sent"?
<Murch> spiral_: I mean, if we're still online after 24h, they can just use the IP to collect the two events. If we have a new IP, we do a new announcement anyway.
<sipa> willcl_ark: no no, that is literally how poisson processes work; i'm just commenting on the fact that the name may imply it generates poisson-distributed values, which is not the case
<stacie> robot-dreams - earlier you mentioned the separate loop that sends an `addr` every 30 seconds (via AVG_ADDRESS_BROADCAST_INTERVAL), how is that different from the advertising that is done once every ~24 hrs (AVG_LOCAL_ADDRESS_BROADCAST_INTERVAL)?
<lightlike> Does anyone know why is method 1 (advertising during VERSION only for outbounds) is necessary - if we just skipped that, wouldn't we self-advertise via SendMessages a few milliseconds later anyway, considering that m_next_local_addr_send is initialized to 0?
<robot-dreams> stacie: yeah, great point. My understanding was that `AVG_LOCAL_ADDRESS_BROADCAST_INTERVAL` is for "add our address to the next outgoing batch" but not for actually doing any sending
<@jnewbery> addr_fetch are short-lived connections that we make to peers where we send a getaddr, wait for the addr response (this is the one that may contain up to 23% of addresses from the peer's addrman) and then disconnect
<amiti> I didn't do the math, but I imagine the time is takes the filter to roll would be very variable based on eg. if you are accepting inbounds (even though its a per-peer filter, you would have more address traffic you send out)
<@jnewbery> I ask whether it's important to send our own address to them because if so, then having the PushAddress() call inside the VERSION message processing might affect whether we send them our address or not
<spiral_> sipa I was thinking that if the address was unintentionally stale it might cause some errors -- more so from ignorant speculation on my part ;)
<amiti> I found it interesting to compare what my computer thinks my ip address is (using `ifconfig | grep inet` vs what googling "what is my ip address" returns. very different!
<spiral_> amiti: wouldn't it be the case that `ifconfig | grep inet` is always going to list a different IP than a IP-query service unless you're connected right to the ISP modem, aka if there's a gateway in front of you `ifconfig | grep inet` is just your ip from a router?
<sipa> Murch: yes, but we determine what our local addresses are through a different mechanism (-externalip, upnp, local interface if it's public, ...)
<amiti> spiral_, sipa: right, makes sense. was just cool to see a specific example :) my impression is that the majority of the time its going to be different?
<spiral_> Are there any defensive benefits of our peers telling us *what they believe is our network location*? E.g. if all peers are honest with our reported location, if an accurate message makes it to us, and one peer has an aberrant address, our node might learn of a MITM interference
<@jnewbery> sipa: oh, so if a peer tells us our address and we don't already know the address through one of those other mechanisms, we'll never use that as our self-advertised address to other peers?
<sipa> amiti: if you run a bitcoind as an internet service, on a server in datacenter somewhere your local interface will have a publicly routable IP usually
<sipa> if you use bitcoind behind a NAT you need to use -externalip to configure your public IP, or use UPnP (which makes bitcoind talk to your router and ask it for its public IP)
<@jnewbery> felixweis: since gleb was basically touching every line of the function I suggested he change the signature. I think the only reasons to use a raw pointer instead of a reference are (1) if you want to be able to pass null; or (2) if you want to reseat the pointer (point it at something else). Neither of those are the case here, so pass-by-reference communicates intent better
<robot-dreams> amiti: PR overview: there used to be two different code paths for adding our own address to the "addresses to be sent on next `addr`" list, and this PR consolidates them.
<robot-dreams> As for behavior change, now we might randomly override our local address with the peer's view of us when responding to the peer's initial `version`.
<elle> amiti: 5) no. it might be replaced by another one later on if the vAddrToSend is too full. It will also not be added to vAddrToSend (and hence not sent) if m_addr_known filter has a matching address.
<@jnewbery> elle: great answer! In reality, I think it's uncommon for vAddrToSend to fill up except when we process a GETADDR (which can only happen once for each connection)
<amiti> but also, when a node receives an addr message, if there's < 10 of them & a couple other conditions are met, it will relay it out to 1-2 other peers