AddrMan, the in-memory database of peers, consists of two tables
(new and tried). AddrMan keeps count of the total number of addresses
in each of these two tables, but not on a per-network basis.
For example, it’s currently not possible to directly query AddrMan if
it has any onion addresses in its new table, or how many of them.
This PR adds internal tracking of the totals by network and table to AddrMan
and adds this information to its public interface.
The added counts are then used to improve the precision of loading fixed seeds,
adding seeds of specific reachable networks we don’t have any addresses for in a
targeted way.
On a longer time frame, this is also a first preparatory step towards more active
management of outbound connections with respect to networks. One of the goals is
to always have at least one outbound connection to each reachable network.
When is a network
reachable in Bitcoin Core?
(it’s not as self-explanatory as it may seem!)
How are addresses relayed over the p2p network treated depending
on whether they are reachable vs. non-reachable - do we store them and/or
forward them to peers? (Hint: Look in ProcessMessage)
How does this PR attempt to make sure that there are no bugs causing
the added counts per network to fall out of sync with other AddrMan internals
such as nNew and nTried?
How can a node currently get stuck with only unreachable addresses in AddrMan,
finding no outbound peers? How does this PR fix it?
Why would it be beneficial to have an outbound connection to
each reachable network at all times? Why is the current logic in
ThreadOpenConnections
insufficient to guarantee this?
What would be the possible next steps towards this goal after this PR?
<roze_paul> XD i've been looking at the inquisition stuff so much i thought that was this week..going to have to swithc y->n in response to if i've reviewed today's PR.
<lightlike> so it's currently a mix: if we are sure we can't reach a network, we set it to unreachable (but we may be wrong, for example I can't reach IPv6 from my current computer, but it's still reachable)
<LarryRuane> doesn't it seem like `onlynet` is slightly misnamed? If we say `-onlynet=tor` then we can also use IPV4 if `-onlynet=ipv4` is also specified ... but i can't think of a better name for tha toption
<LarryRuane> if it was called `-allownet` that would fix that problem, but would also be misleading, because it doesn't imply that others are disallowed!
<lightlike> How are addresses relayed over the p2p network treated depending on whether they are reachable vs. non-reachable - do we store them and/or forward them to peers?
<lightlike> LarryRuane: not really - we mostly want to store addresses we can use (for making outbound connection), so not storing for now them makes sense.
<LarryRuane> node operators can change their `-onlynet` configurations, so I guess it's good if we have some (previously unreachable but now maybe reachable) addresses .. ?
<roze_paul> I can't find a bitcoin-cli command which displays what networks are reachable/unreachable. does that exist, or best practice right now is to refer to the bitcoin.conf?
<lightlike> How does this PR attempt to make sure that there are no bugs causing the added counts per network to fall out of sync with other AddrMan internals such as nNew and nTried?
<lightlike> kevkevin: it's possible that an address can be removed if we add another address that collides with it (would go into the same bucket / location) - but not directly
<LarryRuane> really newb question, but when you say collides, more than one address can be in the same bucket, right? but is there a size limite, so if we're trying to add a new address and its bucket is already full, then do we kick one out?
<LarryRuane> that's why i really like fuzzing, it can often test scenarios that humans wouldn't think of! but you also need checking code to detect when things go wrong
<lightlike> But this was changed at some time, so now each new address gets assigned a specific bucket and position base on some hashing magic, and if that position is already occupied one of them has to be kicked out (no matter if the bucket is full or not)
<LarryRuane> if the `-onlynet` config changes? the way this PR addresses it is, during startup, we fall back to using fixed seeds, but only for reachable networks that we have no addrman addresses for
<lightlike> roze_paul: otherwise, there would be the danger of an attacker spamming us with addresses, evicting all the good currents one through collisions
<lightlike> now that we added the functionality to check whether it's empty for a specific network, we can load just the fixed seeds from networks that we need
<lightlike> but this is not the only (or even the main reason) for this PR: the long-term plan is to change the automatic connection logic wrt networks, so the last 2 questions are about that.
<lightlike> Why would it be beneficial to have an outbound connection to each reachable network at all times? Why is the current logic in ThreadOpenConnections insufficient to guarantee this?
<lightlike> LarryRuane: yes, load means to load it into AddrMan. While AddrMan is in-memory, it gets persisted to disk (peers.dat) regularly, and in particular before we shutdown
<lightlike> it also helps the sub-networks stay together. if everyone used -onlynet=X for their preferred network, bitcoin would split into parts. So it's important to have nodes that are on multiple networks, and I think it makes sense to help those "volunteers" to actually be connected to all of the supported networks at the same time
<LarryRuane> "... insufficient to guarantee this?" ... I'm unsure about this.. if the config includes any `-connect` options, then of course it's not guaranteed, but I'm sure there are other reasons
<roze_paul> can't check the # of peers in the subnets if it can't count the # of peers in each subnet, and now we are able to count the peers, thx to this PR
<lightlike> michaelfolkson: yes. but if it's actually 100% fragemented (which is definitely not the case currently), and some minors are on different networks, it could also lead to a chain split in theory
<LarryRuane> michaelfolkson: I guess I was thinking if a bunch of nodes, including miners (pools I guess) were on tor (or whatever) only, and another group of nodes (including miners) were on ipv4/6 only, then they could extend the chain separately?
<lightlike> michaelfolkson: I would suspect all miners are on clearnet, the latency of other networks is too low, can't afford to wait several seconds more for a new block.
<lightlike> so the next planned steps would be to add logic to the connection making process to have at least one connection to each reachable network - and this PR prepares that