Make AddrFetch connections to fixed seeds (p2p)

https://github.com/bitcoin/bitcoin/pull/26114

Host: glozow  -  PR author: mzumsande

The PR branch HEAD was 7e95f8f6bb at the time of this review club meeting.

Notes

  • Nodes connect to peers based on a set of addresses they know about, stored in their AddrMan, but a brand new node starts out with an empty AddrMan. In order to participate in the network, the node needs to find addresses of peers to connect to.

  • Care is taken to not create a bias in the source of addresses. For example, it would be inappropriate to have all new nodes connect to a list of 10 hard-coded addresses. That list of nodes would become outdated very quickly, have an enormous burden of serving blocks to new nodes, and be an eclipse vulnerability.

    • If the node’s addrman is empty, it queries DNS seeds run by community members.

    • If addrman is still empty, the node attempts connections to a set of fixed seeds, hard-coded and updated once per release. You can read more about how the fixed seeds are generated here.

    • The node doesn’t store addresses for unreachable networks. For example, if the node is only running on Tor, it won’t add a clearnet address to its AddrMan.

  • Prior to PR #26114, fixed seeds are added to AddrMan. This means the node’s first outbound connections are likely all taken from the fixed seeds.

  • An AddrFetch connection is a temporary outbound connection, used specifically for soliciting addresses.

  • The networking logic is multi-threaded (see list of threads and descriptions here).

    • The ThreadOpenConnections() thread processes and sends messages to this node’s peers.

    • The ThreadDNSAddressSeed thread makes connections to DNS seeds.

    • This PR also moves handling of fixed seeds from ThreadOpenConnections to ThreadDNSAddressSeed, and renames the latter to ThreadAddressSeed. Note this is more involved than refactoring some code from one function to another. For example, moving logic from one thread to another could mean that operations are no longer guaranteed to execute in the same order, or that data structures previously accessed by one thread are now shared between multiple.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. Under what circumstances do we connect to the fixed seeds?

  3. What observable behavior change does this PR introduce? What kinds of addresses do we add to AddrMan, and under what circumstances?

  4. What is an AddrFetch connection and what is a full outbound connection? Why might we want to make an AddrFetch connection instead of full outbound connection to fixed seeds? Why might the node operator behind a fixed seed prefer this as well?

  5. The DNS seeds are expected to be responsive and serve up-to-date addresses of Bitcoin nodes. Why doesn’t this help a -onlynet=tor node?

  6. What does the ThreadOpenConnections do? What does ThreadDNSAddressSeed do? Which thread should handle connecting to fixed seeds, and why?

  7. Why wait 2 minutes before adding the fixed seeds to AddrMan?

  117:00 <glozow> #startmeeting
  217:00 <glozow> hi
  317:00 <brunoerg> hi
  417:00 <glozow> this is bitcoin core PR review club
  517:00 <glozow> today we're looking at "Make AddrFetch connections to fixed seeds" https://github.com/bitcoin/bitcoin/pull/26114
  617:01 <stickies-v> hi!
  717:01 <glozow> anyone else here? feel free to say hi
  817:02 <b_101> hi
  917:02 <yashraj> hi
 1017:02 <lightlike> Hi
 1117:02 <araujo88> hi
 1217:02 <LarryRuane> hi! ... background question (sorry) -- when we make an outbound connection, it's one of those types listed here https://github.com/bitcoin/bitcoin/blob/f59e91511a3aa8b2770eeec7034ddc1a9dec918b/src/node/connection_types.h#L17 (except for `INBOUND` of course) -- does the peer we're connecting to know which connection type *we're* making? Or does our peer only know that it's an inbound connection?
 1317:03 <LarryRuane> (apologies, I had that all typed out before the meeting started!)
 1417:03 <glozow> Notes are in the usual place: https://bitcoincore.reviews/26114
 1517:03 <glozow> Did you all get a chance to review the PR or look at the notes? How about a y/n
 1617:03 <LarryRuane> y 0.5
 1717:04 <dergoegge> Hi
 1817:04 <stickies-v> 0.1y, I started looking at the PR but then dove into a deeper rabbit hole than expected. will mostly be lurking!
 1917:04 <amovfx_> hi
 2017:05 <amovfx_> y
 2117:06 <b_101> n/y
 2217:06 <glozow> LarryRuane: the peer only knows that you initiated the connection. it can make a guess, but no you don't tell them anything about the type of connection
 2317:06 <brunoerg> looked at the notes only
 2417:06 <yashraj> y, look(ing) at notes, read the PR.
 2517:06 <LarryRuane> glozow: +1 thank you
 2617:07 <glozow> We can go slowly then. Just fyi, review club is more fun when you prepare ahead of time ;)
 2717:07 <glozow> First question: Under what circumstances do we connect to the fixed seeds?
 2817:08 <brunoerg> glozow: when node’s addrman is empty
 2917:09 <glozow> brunoerg: is that the only condition?
 3017:09 <LarryRuane> only if the DNS seeds don't provide us with any addresses?
 3117:09 <yashraj> +1
 3217:09 <brunoerg> after querying the DNS seeds
 3317:10 <glozow> LarryRuane: correct. but the DNS seeds are usually very responsive! how likely would this happen??
 3417:10 <amovfx_> if (m_addr_fetches.empty() && m_added_nodes.empty()
 3517:10 <stickies-v> the DNS seeds only work with ipv4/6?
 3617:10 <glozow> stickies-v: bingo
 3717:10 <amovfx_> stickies i think so
 3817:10 <brunoerg> stickies-v: +1
 3917:11 <LarryRuane> would it be possible to hard-code some DNS seeds for other network types?
 4017:11 <brunoerg> is there a way to make them work with other network besides ipv4/6?
 4117:12 <brunoerg> or are there some technical limitations?
 4217:12 <yashraj> already confused lol. what's the difference between fixed seeds and dns seeds?
 4317:13 <stickies-v> I remember something about TOR addresses being too large for DNS responses but I'm very unsure about that
 4417:14 <LarryRuane> yashraj: notice that the DNS seeds are strings like `seed.bitcoin.sipa.be` ... fixed seeds are raw IP addresses (and port numbers)
 4517:14 <yashraj> oh thanks
 4617:14 <glozow> yeah it's technical limitations. I don't know many details though tbh
 4717:14 <Guest922> noob question, how do you hear?
 4817:15 <amovfx_> Guest922: You will have to be a bit more specific
 4917:15 <Guest922> listen in to hear the speaker?
 5017:16 <Guest922> or is this just a chat?
 5117:16 <glozow> the DNS seeds should be serving you live data, i.e. updated quite regularly with the addresses of nodes they've been able to connect to. the fixed seeds are hardcoded and thus only updated once per release. should be intuitive which source you'd prefer to use
 5217:16 <glozow> Guest922: there is no audio, this is a text-only chat
 5317:16 <Guest922> thanks
 5417:16 <LarryRuane> DNS seeds! that's why we try those first? the fixed seeds are kind of a backup plan?
 5517:17 <yashraj> the dns seeds?
 5617:17 <stickies-v> DNS seeds are DNS servers, not bitcoin nodes. fixed seeds are bitcoin nodes
 5717:17 <glozow> LarryRuane: yashraj: yes
 5817:17 <yashraj> stickies-v: oh right thanks makes so much more sense now :P
 5917:17 <b_101> stickies: +1
 6017:18 <LarryRuane> if we do need to resort to fixed seeds, we're not *limited* to just them, right? We can get more addresses from them?
 6117:18 <LarryRuane> (eventually)
 6217:19 <glozow> LarryRuane: yes, that's part of what this PR is doing
 6317:19 <glozow> Can anyone tell us: What observable behavior change does this PR introduce? What kinds of addresses do we add to AddrMan, and under what circumstances?
 6417:20 <stickies-v> I think we're now only doing AddrFetch connections to fixed seeds? which means we only use them to get addresses, not txs and blocks etc?
 6517:21 <LarryRuane> Our (starting from scratch) node won't be as likely to connect to only fixed-seed nodes; we'll be more likely to connect to nodes they know about (and tell us about)
 6617:21 <amovfx_> If no dns_seeds are requested, then if addr_fetaches is empty and added nodes are empty add fixed seeds
 6717:21 <stickies-v> (well we only add them as AddrFetch for the first 2 minutes, hopefully we've got more addresses in our AddrMan by then)
 6817:21 <yashraj> move fixed seeds to addrfetch from addrman so they're only used if they have to be which reduces bandwidth for them and protects against eclipse attacks?
 6917:22 <LarryRuane> yashraj: yes, definitely reduces the load on the fixed seed nodes (they won't get an many inbound connections as they do currently)
 7017:23 <LarryRuane> (well I mean, they'll get temporary `ADDR_FETCH` connections, but not full connections)
 7117:23 <glozow> right. the idea here is to fill up our AddrMan more quickly, since we make connections to the addrs we get from there
 7217:24 <LarryRuane> definitely a cool idea!
 7317:24 <glozow> so instead of just adding the fixed seeds to our addrman, we ask them for more addrs to add to addrman
 7417:24 <brunoerg> LarryRuane: +1
 7517:25 <glozow> and yes, this reduces the load on those fixed seeds. if a bunch of new nodes connect to them first thing, they're probably serving up a lot of blocks
 7617:26 <yashraj> we ask fixed seeds for addresses or blocks or both?
 7717:26 <glozow> What is an AddrFetch connection?
 7817:26 <glozow> hint: https://github.com/bitcoin/bitcoin/blob/f59e91511a3aa8b2770eeec7034ddc1a9dec918b/src/node/connection_types.h#L70-L76
 7917:26 <glozow> yashraj: just addresses
 8017:26 <LarryRuane> yashraj: addresses (not blocks, not transactions)
 8117:27 <b_101> to get addresses of peers
 8217:29 <LarryRuane> so when we make an AddrFetch connection, we only send getaddr messages? https://en.bitcoin.it/wiki/Protocol_documentation#getaddr
 8317:30 <LarryRuane> our peer replies with one of these? https://en.bitcoin.it/wiki/Protocol_documentation#addr
 8417:30 <b_101> LarryRuane: that's what I understand
 8517:33 <lightlike> It's pretty much a normal connection, with the special property that we disconnect once we receive an answer to our getaddr message. It's not like we disable block relay or tx relay, we may even get a block from them before they get disconnected.
 8617:34 <glozow> LarryRuane: correct, see disconnection logic here: https://github.com/bitcoin/bitcoin/blob/1a369f006fd0bec373b95001ed84b480e852f191/src/net_processing.cpp#L2793-L2796
 8717:35 <glozow> well, addr or addrv2.
 8817:35 <glozow> which unfortunately is not documented in that bitcoin wiki, it seems
 8917:36 <glozow> Ok next question was already answered, onlynet=tor means you won't get much help from the DNS seeds
 9017:36 <glozow> What does the `ThreadOpenConnections` do? What does `ThreadDNSAddressSeed` do? Which thread should handle connecting to fixed seeds, and why?
 9117:37 <glozow> hint: code here https://github.com/bitcoin/bitcoin/blob/5e82b9ba96b6c5614a1187382a086e5694dff544/src/net.cpp#L1578 and here https://github.com/bitcoin/bitcoin/blob/5e82b9ba96b6c5614a1187382a086e5694dff544/src/net.cpp#L1390
 9217:40 <brunoerg> `ThreadDNSAddressSeed` exists to connect to DNS Seeds
 9317:40 <glozow> and `ThreadOpenConnections` ?
 9417:41 <brunoerg> processes and sends messages to node's peers (from notes)
 9517:43 <amovfx_> +1 bruno
 9617:43 <glozow> Which thread should handle connecting to fixed seeds, and why?
 9717:44 <amovfx_> ThreadDNSAddressSeed because it is filling up addrman?
 9817:44 <amovfx_> OpenCOnenctions is more general
 9917:44 <amovfx_> ?
10017:45 <glozow> That's not necessarily true. If you receive an addr message from your peer, you are processing them on `ThreadOpenConnections`
10117:45 <brunoerg> ThreadDNSAddressSeed because we will use the fixed seeds after trying to get some addrs from DNS seeds (in case we wasn't able to get anything)?
10217:45 <glozow> unsure what "more general" means
10317:47 <amovfx_> I thought openconnections was a more generalized function for the btc software
10417:47 <glozow> brunoerg: yeah that's pretty much what I'm getting at. they're very similar in nature; you're trying to solicit addrs because your addrman is empty
10517:47 <amovfx_> everything uses it
10617:48 <glozow> that is incorrect
10717:49 <glozow> ok last question
10817:49 <yashraj> yeah that's why the renaming also makes sense. got it.
10917:50 <glozow> After we make the addrfetch connection to the fixed seed. We don't immediately add it to our addrman. We wait 2 minutes. Why?
11017:50 <glozow> code here: https://github.com/bitcoin-core-review-club/bitcoin/blob/7e95f8f6bb0e06b0676db3da05b2a55a011c9668/src/net.cpp#L1560
11117:51 <glozow> yashraj: right. `ThreadAddressSeed`as in, the thread with which we bootstrap our address manager
11217:51 <amovfx_> to give the connections time to respond so addrs fill up
11317:51 <stickies-v> sounds mostly like a fallback for unforeseen/unexpected circumstances? if the network gets bloated with unresponsive peers and all we get is junk, then at least the fixed seeds can still bootstrap?
11417:52 <stickies-v> (uneducated guess)
11517:52 <glozow> btw, there is some documentation on the threads in developer-notes: https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md#threads
11617:54 <lightlike> also, the more time passes after a releases, the more of the fixed seeds are no longer online, or maybe are operating under a new IP.
11717:54 <glozow> Don't forget this is multi-threaded, so other things are still happening while this thread is seeding addresses. i.e., we are also making full outbound connections to catch up our chainstate at the same time
11817:57 <stickies-v> lightlike: good point, but is that in relation to this latest question? i'm not sure I see the connection
11917:58 <amovfx_> i thought it was for an opportunity for us to get addresses filled up through other connections before we resort to fixed seeds
12017:58 <lightlike> stickies-v: yes, if we just try 10 of them and they are all offline, it makes sense to add the other ones to the addrman after a while (so that we can hopefully make a connection to them).
12117:59 <amovfx_> because arent previous addresses stored when bitcoin shuts down
12217:59 <amovfx_> and then they are reused on relaunch to reconnect faster
12318:00 <stickies-v> lightlike: ah, so we try 10 fixed seeds for addrfetch and then after 2 minutes can use all the fixed seeds to add to addrman?
12418:01 <lightlike> yes! It's like falling back to the status quo, where we would just add them all to addrman initially.
12518:02 <stickies-v> I see
12618:02 <glozow> seems we're out of time and out of questions!
12718:02 <glozow> thanks for coming everyone, see you next week
12818:02 <glozow> #endmeeting