Track AddrMan totals by network and table, improve precision of adding fixed seeds (p2p)

https://github.com/bitcoin/bitcoin/pull/26847

Host: mzumsande  -  PR author: mzumsande

Notes

  • AddrMan, the in-memory database of peers, consists of two tables (new and tried). AddrMan keeps count of the total number of addresses in each of these two tables, but not on a per-network basis. For example, it’s currently not possible to directly query AddrMan if it has any onion addresses in its new table, or how many of them.

  • This PR adds internal tracking of the totals by network and table to AddrMan and adds this information to its public interface.

  • The added counts are then used to improve the precision of loading fixed seeds, adding seeds of specific reachable networks we don’t have any addresses for in a targeted way.

  • On a longer time frame, this is also a first preparatory step towards more active management of outbound connections with respect to networks. One of the goals is to always have at least one outbound connection to each reachable network.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. When is a network reachable in Bitcoin Core? (it’s not as self-explanatory as it may seem!)

  3. How are addresses relayed over the p2p network treated depending on whether they are reachable vs. non-reachable - do we store them and/or forward them to peers? (Hint: Look in ProcessMessage)

  4. How does this PR attempt to make sure that there are no bugs causing the added counts per network to fall out of sync with other AddrMan internals such as nNew and nTried?

  5. How can a node currently get stuck with only unreachable addresses in AddrMan, finding no outbound peers? How does this PR fix it?

  6. Why would it be beneficial to have an outbound connection to each reachable network at all times? Why is the current logic in ThreadOpenConnections insufficient to guarantee this?

  7. What would be the possible next steps towards this goal after this PR?

Meeting Log

  117:00 <lightlike> #startmeeting
  217:00 <LarryRuane> hi!
  317:00 <roze_paul> hi
  417:00 <codo> hi
  517:00 <lightlike> hi everyone, and welcome to Review Club!
  617:00 <svav> Hi
  717:00 <kouloumos> hi
  817:00 <lightlike> feel free to say hi - anyone here for the first time?
  917:02 <lightlike> Today's PR is #26847, notes can be found at https://bitcoincore.reviews/26847
 1017:02 <LarryRuane> I see the maintainers rug-pulled us again! :) (congrats on getting this merged!)
 1117:02 <lightlike> LarryRuane: thanks - yes, it got merged yesterday
 1217:03 <lightlike> Who got the chance to have a look at the PR (y/n)?
 1317:03 <codo> y
 1417:03 <LarryRuane> 0.7y
 1517:03 <svav> y had a look
 1617:04 <emzy> hi
 1717:04 <emzy> n
 1817:04 <b_101> hi
 1917:04 <roze_paul> y
 2017:04 <lightlike> great- let's move to the first question:
 2117:04 <pablomartin> hello!
 2217:05 <lightlike> When is a network reachable in Bitcoin Core?
 2317:05 <lightlike> (it’s not as self-explanatory as it may seem!)
 2417:06 <LarryRuane> all networks are initially assumed reachable (`vfLimited` array init to false) https://github.com/bitcoin/bitcoin/blob/ba3d32715f985314eb1fdb006bfc4127f8d516a7/src/net.cpp#L119
 2517:06 <LarryRuane> `-onlynet=x -onlynet=y` sets all networks except x and y to be unreachable
 2617:06 <LarryRuane> networks cjdns, onion, and i2p are set to unreachable if those services aren't enabled
 2717:07 <lightlike> LarryRuane: correct!
 2817:07 <hernanmarino> Hi all , sorry for being late. I didn't have time to review, I'm here as a reader (mostly) today
 2917:07 <LarryRuane> "not as self-explanatory" -- yes, because we don't actually need to try to use them to consider them reachable!
 3017:07 <roze_paul> XD i've been looking at the inquisition stuff so much i thought that was this week..going to have to swithc y->n in response to if i've reviewed today's PR.
 3117:08 <LarryRuane> inscription?
 3217:08 <lightlike> so it's currently a mix: if we are sure we can't reach a network, we set it to unreachable (but we may be wrong, for example I can't reach IPv6 from my current computer, but it's still reachable)
 3317:09 <roze_paul> bitoin-inquisition, next week's topic* sorry, off-topic
 3417:09 <lightlike> and also if we don't _want_ to reach it (by using the -onlynet option)
 3517:09 <LarryRuane> (oh sorry)
 3617:10 <lightlike> this has been the source of some confusion in the past...
 3717:10 <LarryRuane> doesn't it seem like `onlynet` is slightly misnamed? If we say `-onlynet=tor` then we can also use IPV4 if `-onlynet=ipv4` is also specified ... but i can't think of a better name for tha toption
 3817:11 <kouloumos> Is the past confusion related to https://github.com/bitcoin/bitcoin/issues/7098?
 3917:12 <LarryRuane> if it was called `-allownet` that would fix that problem, but would also be misleading, because it doesn't imply that others are disallowed!
 4017:12 <LarryRuane> so it's probably fine, but just something to keep in mind
 4117:12 <lightlike> kouloumos: not sure, I never saw that issue before.
 4217:13 <kouloumos> I think I confused the confusion you were referring to
 4317:13 <lightlike> LarryRuane: true, that name doesn't seem perfect either, but it's probably too late to change it.
 4417:13 <LarryRuane> yes for sure, i wasn't suggesting changing it, there's no way ... just an observation
 4517:13 <codo> maybe splitting it up into two options like `restrictnets` plus `allownet` would do it
 4617:14 <LarryRuane> the help text for `-onlynet` is pretty good, it says it can be specified multiple times
 4717:15 <LarryRuane> anyway, sorry, didn't mean to sidetrack
 4817:15 <lightlike> thought -onlynet has been the subject of even more confusion anyway... not because of the name, but because of how it works
 4917:15 <lightlike> *though
 5017:15 <lightlike> next question:
 5117:15 <lightlike> How are addresses relayed over the p2p network treated depending on whether they are reachable vs. non-reachable - do we store them and/or forward them to peers?
 5217:15 <roze_paul> LarryRuane: lightlike: some commenting explaining the name and the actual behavior in the codebase might be helpful
 5317:16 <roze_paul> ah the helptext is already good, disregard.
 5417:16 <codo> we do forward (less) and not store non-reachables?
 5517:16 <LarryRuane> code comment: Relay reachable addresses to 2 peers. Unreachable addresses are relayed randomly to 1 or 2 peers.
 5617:16 <LarryRuane> https://github.com/bitcoin/bitcoin/blob/ba3d32715f985314eb1fdb006bfc4127f8d516a7/src/net_processing.cpp#L2077
 5717:17 <LarryRuane> but we only store into our own addrman if reachable: https://github.com/bitcoin/bitcoin/blob/ba3d32715f985314eb1fdb006bfc4127f8d516a7/src/net_processing.cpp#L3643
 5817:17 <michaelfolkson> hi
 5917:17 <lightlike> codo, LarryRuane : correct!
 6017:17 <glozow> hi
 6117:18 <lightlike> this is the source of the problem the PR fixes - if a network is not Reachable, we won't store any addresses for it AddrMan.
 6217:19 <lightlike> next question:
 6317:19 <LarryRuane> oh and we do want to do that (store addresses even if unreachable)?
 6417:20 <pablomartin> / Do not store addresses outside our network
 6517:20 <lightlike> LarryRuane: not really - we mostly want to store addresses we can use (for making outbound connection), so not storing for now them makes sense.
 6617:21 <lightlike> but we'll come to the problem later
 6717:21 <LarryRuane> node operators can change their `-onlynet` configurations, so I guess it's good if we have some (previously unreachable but now maybe reachable) addresses .. ?
 6817:21 <LarryRuane> (in addrman)
 6917:22 <LarryRuane> we can go on, don't mean to disrupt!
 7017:22 <lightlike> LarryRuane: saving some addresses just in case might seem a bit overkill, so that's not how this PR does it...
 7117:22 <roze_paul> I can't find a bitcoin-cli command which displays what networks are reachable/unreachable. does that exist, or best practice right now is to refer to the bitcoin.conf?
 7217:24 <pablomartin> roze_paul: getnetworkinfo?
 7317:24 <lightlike> roze_paul: the "getnetworkinfo" RPC has the info
 7417:24 <LarryRuane> roze_paul: i think `getnetworkinfo`
 7517:24 <lightlike> next question is about the implementation:
 7617:25 <lightlike> How does this PR attempt to make sure that there are no bugs causing the added counts per network to fall out of sync with other AddrMan internals such as nNew and nTried?
 7717:26 <LarryRuane> new test case `BOOST_AUTO_TEST_CASE(addrman_size)`? also there's `test/fuzz/addrman.cpp`? (although I don't see big changes to it)
 7817:28 <pablomartin> about the implmentation... is it about the recomputation of the totals?
 7917:28 <lightlike> LarryRuane: oh, I could've updated the fuzz test to make use of the new args to Size(), forgot about that
 8017:29 <kevkevin> I see BOOST_AUTO_TEST_CASE are we only able to add address's or is it possible for addresses to be removed?
 8117:29 <pablomartin> *recompute
 8217:30 <lightlike> kevkevin: no, we can't remove addresses from AddrMan, that functionality isn't really needed so it's not implemented
 8317:31 <lightlike> kevkevin: it's possible that an address can be removed if we add another address that collides with it (would go into the same bucket / location) - but not directly
 8417:32 <kevkevin> ahh I see thanks!
 8517:32 <lightlike> I meant two things: 1.) I added a check in CheckAddrman, so if that is enabled and the internals would mismatch it would return an error.
 8617:33 <lightlike> 2.) I tried to update the network-specific counts as closely as possible to nNew and nTried
 8717:34 <LarryRuane> really newb question, but when you say collides, more than one address can be in the same bucket, right? but is there a size limite, so if we're trying to add a new address and its bucket is already full, then do we kick one out?
 8817:34 <LarryRuane> lightlike: oh that's cool!
 8917:34 <lightlike> the unit tests help too of course, but there is always the possibility that they don't cover the specific situation
 9017:35 <lightlike> LarryRuane: Each bucket has multiple bucket positions, and no more than one address can be in the same position within a given bucket.
 9117:35 <LarryRuane> that's why i really like fuzzing, it can often test scenarios that humans wouldn't think of! but you also need checking code to detect when things go wrong
 9217:36 <LarryRuane> oh i see, so collisions are between just two addresses (but it's sort of pseudo-random i suppose)
 9317:36 <lightlike> LarryRuane: initially, it was like you said (buckets would "fill up"), therefore the bucket terminonology.
 9417:36 <LarryRuane> (i need to learn about buckets one of these days!)
 9517:37 <LarryRuane> lightlike: ah got it, thanks
 9617:37 <lightlike> But this was changed at some time, so now each new address gets assigned a specific bucket and position base on some hashing magic, and if that position is already occupied one of them has to be kicked out (no matter if the bucket is full or not)
 9717:37 <lightlike> next q:
 9817:37 <lightlike> How can a node currently get stuck with only unreachable addresses in AddrMan, finding no outbound peers? How does this PR fix it?
 9917:39 <LarryRuane> if the `-onlynet` config changes? the way this PR addresses it is, during startup, we fall back to using fixed seeds, but only for reachable networks that we have no addrman addresses for
10017:39 <roze_paul> lightlike: is the addr that is removed from the bucket _always_ the incumbent/older addr of the two?
10117:39 <LarryRuane> https://github.com/bitcoin/bitcoin/blob/ba3d32715f985314eb1fdb006bfc4127f8d516a7/src/net.cpp#L1664
10217:41 <codo> in the old situation the fixed seeds where not consulted at all after a `-onlynet` change?
10317:41 <lightlike> roze_paul: Good question! There is some logic why the old address might be bad (IsTerrible, https://github.com/bitcoin/bitcoin/blob/2d5acc901db336868dee158022a115b120b5b675/src/addrman.cpp#L67). If it's not "terrible", the old address is kept, and the new one not accepted.
10417:42 <lightlike> roze_paul: otherwise, there would be the danger of an attacker spamming us with addresses, evicting all the good currents one through collisions
10517:42 <roze_paul> had to redeem myself for my getnetworkinfo question :)
10617:42 <roze_paul> cool
10717:43 <lightlike> LarryRuane, codo: correct! In the old situation we'd only query the fixed seeds when AddrMan was completely empty
10817:43 <pablomartin> the pr fixes it with the 2nd commit... ?
10917:44 <lightlike> now that we added the functionality to check whether it's empty for a specific network, we can load just the fixed seeds from networks that we need
11017:44 <lightlike> pablomartin: yes, correct
11117:45 <LarryRuane> and just to be clear, "load" means add to our addrman list? Or does that not get persisted to disk (is it memory-only)?
11217:45 <LarryRuane> *those not get persisted
11317:46 <lightlike> but this is not the only (or even the main reason) for this PR: the long-term plan is to change the automatic connection logic wrt networks, so the last 2 questions are about that.
11417:46 <lightlike> Why would it be beneficial to have an outbound connection to each reachable network at all times? Why is the current logic in ThreadOpenConnections insufficient to guarantee this?
11517:47 <LarryRuane> I'm a little unsure on this first question, but is the benefit to make eclipse attacks more difficult? your peers are more spread out
11617:47 <lightlike> LarryRuane: yes, load means to load it into AddrMan. While AddrMan is in-memory, it gets persisted to disk (peers.dat) regularly, and in particular before we shutdown
11717:47 <codo> I think it would be beneficial because we would have recent good addresses for all networks.
11817:47 <roze_paul> Intuitively, a node would be more difficult to attack (DoS, eclipse)
11917:48 <lightlike> yes, that would be the selfish reason. there is another, non-selfish one
12017:49 <LarryRuane> hmm.. so that users of only one not-very-much-used network, like I2P, get some peers? not sure
12117:49 <codo> Others would have info about networks unreachable for them?
12217:50 <lightlike> it also helps the sub-networks stay together. if everyone used -onlynet=X for their preferred network, bitcoin would split into parts. So it's important to have nodes that are on multiple networks, and I think it makes sense to help those "volunteers" to actually be connected to all of the supported networks at the same time
12317:51 <lightlike> so that's the second part of the q: Why is the current logic in ThreadOpenConnections insufficient to guarantee this?
12417:51 <lightlike> i.e. how do we currently choose outbound peers with respect to different networks?
12517:52 <LarryRuane> oh so in other words, without this, there's more of a chance of an actual chain split?? (if so, that's a great reason!)
12617:52 <codo> It doesn't exclude the situation where there there are no peers for a certain network.
12717:53 <codo> Just doesn't check for it.
12817:54 <LarryRuane> "... insufficient to guarantee this?" ... I'm unsure about this.. if the config includes any `-connect` options, then of course it's not guaranteed, but I'm sure there are other reasons
12917:54 <lightlike> codo, yes: so what would happen if we, for example had 10k addresses for clearnet and only 1000 for I2P in addrman?
13017:54 <LarryRuane> codo: that makes sense!
13117:54 <codo> lightlike: very small chance for an I2P address to get selected
13217:55 <lightlike> yes, exactly!
13317:55 <LarryRuane> oh we choose randomly among those 11k, so chances are pretty good that they'll all be clearnet!
13417:55 <roze_paul> can't check the # of peers in the subnets if it can't count the # of peers in each subnet, and now we are able to count the peers, thx to this PR
13517:55 <michaelfolkson> LarryRuane: Just a more fragmented network, not increased risk of chain split right? What's your chain split scenario?
13617:55 <lightlike> so it's mostly random, without any management currently.
13717:56 <lightlike> michaelfolkson: yes. but if it's actually 100% fragemented (which is definitely not the case currently), and some minors are on different networks, it could also lead to a chain split in theory
13817:56 <lightlike> *miners, not minors, haha
13917:57 <LarryRuane> michaelfolkson: I guess I was thinking if a bunch of nodes, including miners (pools I guess) were on tor (or whatever) only, and another group of nodes (including miners) were on ipv4/6 only, then they could extend the chain separately?
14017:58 <michaelfolkson> lightlike: I wonder if they are actually on different networks (I agree they should be)
14117:59 <lightlike> michaelfolkson: I would suspect all miners are on clearnet, the latency of other networks is too low, can't afford to wait several seconds more for a new block.
14217:59 <LarryRuane> yes good point
14317:59 <LarryRuane> so not a problem in practice
14417:59 <lightlike> so the next planned steps would be to add logic to the connection making process to have at least one connection to each reachable network - and this PR prepares that
14518:00 <lightlike> time's up - thanks everone, great discussion!
14618:00 <lightlike> #endmeeting