Track AddrMan totals by network and table, improve precision of adding fixed seeds (p2p)

Feb 1, 2023

https://github.com/bitcoin/bitcoin/pull/26847

Host: mzumsande - PR author: mzumsande

Notes

AddrMan, the in-memory database of peers, consists of two tables (new and tried). AddrMan keeps count of the total number of addresses in each of these two tables, but not on a per-network basis. For example, it’s currently not possible to directly query AddrMan if it has any onion addresses in its new table, or how many of them.
This PR adds internal tracking of the totals by network and table to AddrMan and adds this information to its public interface.
The added counts are then used to improve the precision of loading fixed seeds, adding seeds of specific reachable networks we don’t have any addresses for in a targeted way.
On a longer time frame, this is also a first preparatory step towards more active management of outbound connections with respect to networks. One of the goals is to always have at least one outbound connection to each reachable network.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
When is a network reachable in Bitcoin Core? (it’s not as self-explanatory as it may seem!)
How are addresses relayed over the p2p network treated depending on whether they are reachable vs. non-reachable - do we store them and/or forward them to peers? (Hint: Look in ProcessMessage)
How does this PR attempt to make sure that there are no bugs causing the added counts per network to fall out of sync with other AddrMan internals such as nNew and nTried?
How can a node currently get stuck with only unreachable addresses in AddrMan, finding no outbound peers? How does this PR fix it?
Why would it be beneficial to have an outbound connection to each reachable network at all times? Why is the current logic in ThreadOpenConnections insufficient to guarantee this?
What would be the possible next steps towards this goal after this PR?

Meeting Log

17:00

<lightlike> #startmeeting

17:00

<LarryRuane> hi!

17:00

<roze_paul> hi

17:00

<codo> hi

17:00

<lightlike> hi everyone, and welcome to Review Club!

17:00

<svav> Hi

17:00

<kouloumos> hi

17:00

<lightlike> feel free to say hi - anyone here for the first time?

17:02

<lightlike> Today's PR is #26847, notes can be found at https://bitcoincore.reviews/26847

17:02

<LarryRuane> I see the maintainers rug-pulled us again! :) (congrats on getting this merged!)

17:02

<lightlike> LarryRuane: thanks - yes, it got merged yesterday

17:03

<lightlike> Who got the chance to have a look at the PR (y/n)?

17:03

<codo> y

17:03

<LarryRuane> 0.7y

17:03

<svav> y had a look

17:04

<emzy> hi

17:04

<emzy> n

17:04

<b_101> hi

17:04

<roze_paul> y

17:04

<lightlike> great- let's move to the first question:

17:04

<pablomartin> hello!

17:05

<lightlike> When is a network reachable in Bitcoin Core?

17:05

<lightlike> (it’s not as self-explanatory as it may seem!)

17:06

<LarryRuane> all networks are initially assumed reachable (`vfLimited` array init to false) https://github.com/bitcoin/bitcoin/blob/ba3d32715f985314eb1fdb006bfc4127f8d516a7/src/net.cpp#L119

17:06

<LarryRuane> `-onlynet=x -onlynet=y` sets all networks except x and y to be unreachable

17:06

<LarryRuane> networks cjdns, onion, and i2p are set to unreachable if those services aren't enabled

17:07

<lightlike> LarryRuane: correct!

17:07

<hernanmarino> Hi all , sorry for being late. I didn't have time to review, I'm here as a reader (mostly) today

17:07

<LarryRuane> "not as self-explanatory" -- yes, because we don't actually need to try to use them to consider them reachable!

17:07

<roze_paul> XD i've been looking at the inquisition stuff so much i thought that was this week..going to have to swithc y->n in response to if i've reviewed today's PR.

17:08

<LarryRuane> inscription?

17:08

<lightlike> so it's currently a mix: if we are sure we can't reach a network, we set it to unreachable (but we may be wrong, for example I can't reach IPv6 from my current computer, but it's still reachable)

17:09

<roze_paul> bitoin-inquisition, next week's topic* sorry, off-topic

17:09

<lightlike> and also if we don't _want_ to reach it (by using the -onlynet option)

17:09

<LarryRuane> (oh sorry)

17:10

<lightlike> this has been the source of some confusion in the past...

17:10

<LarryRuane> doesn't it seem like `onlynet` is slightly misnamed? If we say `-onlynet=tor` then we can also use IPV4 if `-onlynet=ipv4` is also specified ... but i can't think of a better name for tha toption

17:11

<kouloumos> Is the past confusion related to https://github.com/bitcoin/bitcoin/issues/7098?

17:12

<LarryRuane> if it was called `-allownet` that would fix that problem, but would also be misleading, because it doesn't imply that others are disallowed!

17:12

<LarryRuane> so it's probably fine, but just something to keep in mind

17:12

<lightlike> kouloumos: not sure, I never saw that issue before.

17:13

<kouloumos> I think I confused the confusion you were referring to

17:13

<lightlike> LarryRuane: true, that name doesn't seem perfect either, but it's probably too late to change it.

17:13

<LarryRuane> yes for sure, i wasn't suggesting changing it, there's no way ... just an observation

17:13

<codo> maybe splitting it up into two options like `restrictnets` plus `allownet` would do it

17:14

<LarryRuane> the help text for `-onlynet` is pretty good, it says it can be specified multiple times

17:15

<LarryRuane> anyway, sorry, didn't mean to sidetrack

17:15

<lightlike> thought -onlynet has been the subject of even more confusion anyway... not because of the name, but because of how it works

17:15

<lightlike> *though

17:15

<lightlike> next question:

17:15

<lightlike> How are addresses relayed over the p2p network treated depending on whether they are reachable vs. non-reachable - do we store them and/or forward them to peers?

17:15

<roze_paul> LarryRuane: lightlike: some commenting explaining the name and the actual behavior in the codebase might be helpful

17:16

<roze_paul> ah the helptext is already good, disregard.

17:16

<codo> we do forward (less) and not store non-reachables?

17:16

<LarryRuane> code comment: Relay reachable addresses to 2 peers. Unreachable addresses are relayed randomly to 1 or 2 peers.

17:16

<LarryRuane> https://github.com/bitcoin/bitcoin/blob/ba3d32715f985314eb1fdb006bfc4127f8d516a7/src/net_processing.cpp#L2077

17:17

<LarryRuane> but we only store into our own addrman if reachable: https://github.com/bitcoin/bitcoin/blob/ba3d32715f985314eb1fdb006bfc4127f8d516a7/src/net_processing.cpp#L3643

17:17

<michaelfolkson> hi

17:17

<lightlike> codo, LarryRuane : correct!

17:17

<glozow> hi

17:18

<lightlike> this is the source of the problem the PR fixes - if a network is not Reachable, we won't store any addresses for it AddrMan.

17:19

<lightlike> next question:

17:19

<LarryRuane> oh and we do want to do that (store addresses even if unreachable)?

17:20

<pablomartin> / Do not store addresses outside our network

17:20

<lightlike> LarryRuane: not really - we mostly want to store addresses we can use (for making outbound connection), so not storing for now them makes sense.

17:21

<lightlike> but we'll come to the problem later

17:21

<LarryRuane> node operators can change their `-onlynet` configurations, so I guess it's good if we have some (previously unreachable but now maybe reachable) addresses .. ?

17:21

<LarryRuane> (in addrman)

17:22

<LarryRuane> we can go on, don't mean to disrupt!

17:22

<lightlike> LarryRuane: saving some addresses just in case might seem a bit overkill, so that's not how this PR does it...

17:22

<roze_paul> I can't find a bitcoin-cli command which displays what networks are reachable/unreachable. does that exist, or best practice right now is to refer to the bitcoin.conf?

17:24

<pablomartin> roze_paul: getnetworkinfo?

17:24

<lightlike> roze_paul: the "getnetworkinfo" RPC has the info

17:24

<LarryRuane> roze_paul: i think `getnetworkinfo`

17:24

<lightlike> next question is about the implementation:

17:25

<lightlike> How does this PR attempt to make sure that there are no bugs causing the added counts per network to fall out of sync with other AddrMan internals such as nNew and nTried?

17:26

<LarryRuane> new test case `BOOST_AUTO_TEST_CASE(addrman_size)`? also there's `test/fuzz/addrman.cpp`? (although I don't see big changes to it)

17:28

<pablomartin> about the implmentation... is it about the recomputation of the totals?

17:28

<lightlike> LarryRuane: oh, I could've updated the fuzz test to make use of the new args to Size(), forgot about that

17:29

<kevkevin> I see BOOST_AUTO_TEST_CASE are we only able to add address's or is it possible for addresses to be removed?

17:29

<pablomartin> *recompute

17:30

<lightlike> kevkevin: no, we can't remove addresses from AddrMan, that functionality isn't really needed so it's not implemented

17:31

<lightlike> kevkevin: it's possible that an address can be removed if we add another address that collides with it (would go into the same bucket / location) - but not directly

17:32

<kevkevin> ahh I see thanks!

17:32

<lightlike> I meant two things: 1.) I added a check in CheckAddrman, so if that is enabled and the internals would mismatch it would return an error.

17:33

<lightlike> 2.) I tried to update the network-specific counts as closely as possible to nNew and nTried

17:34

<LarryRuane> really newb question, but when you say collides, more than one address can be in the same bucket, right? but is there a size limite, so if we're trying to add a new address and its bucket is already full, then do we kick one out?

17:34

<LarryRuane> lightlike: oh that's cool!

17:34

<lightlike> the unit tests help too of course, but there is always the possibility that they don't cover the specific situation

17:35

<lightlike> LarryRuane: Each bucket has multiple bucket positions, and no more than one address can be in the same position within a given bucket.

17:35

<LarryRuane> that's why i really like fuzzing, it can often test scenarios that humans wouldn't think of! but you also need checking code to detect when things go wrong

17:36

<LarryRuane> oh i see, so collisions are between just two addresses (but it's sort of pseudo-random i suppose)

17:36

<lightlike> LarryRuane: initially, it was like you said (buckets would "fill up"), therefore the bucket terminonology.

17:36

<LarryRuane> (i need to learn about buckets one of these days!)

17:37

<LarryRuane> lightlike: ah got it, thanks

17:37

<lightlike> But this was changed at some time, so now each new address gets assigned a specific bucket and position base on some hashing magic, and if that position is already occupied one of them has to be kicked out (no matter if the bucket is full or not)

17:37

<lightlike> next q:

17:37

<lightlike> How can a node currently get stuck with only unreachable addresses in AddrMan, finding no outbound peers? How does this PR fix it?

17:39

<LarryRuane> if the `-onlynet` config changes? the way this PR addresses it is, during startup, we fall back to using fixed seeds, but only for reachable networks that we have no addrman addresses for

100

17:39

<roze_paul> lightlike: is the addr that is removed from the bucket _always_ the incumbent/older addr of the two?

101

17:39

<LarryRuane> https://github.com/bitcoin/bitcoin/blob/ba3d32715f985314eb1fdb006bfc4127f8d516a7/src/net.cpp#L1664

102

17:41

<codo> in the old situation the fixed seeds where not consulted at all after a `-onlynet` change?

103

17:41

<lightlike> roze_paul: Good question! There is some logic why the old address might be bad (IsTerrible, https://github.com/bitcoin/bitcoin/blob/2d5acc901db336868dee158022a115b120b5b675/src/addrman.cpp#L67). If it's not "terrible", the old address is kept, and the new one not accepted.

104

17:42

<lightlike> roze_paul: otherwise, there would be the danger of an attacker spamming us with addresses, evicting all the good currents one through collisions

105

17:42

<roze_paul> had to redeem myself for my getnetworkinfo question :)

106

17:42

<roze_paul> cool

107

17:43

<lightlike> LarryRuane, codo: correct! In the old situation we'd only query the fixed seeds when AddrMan was completely empty

108

17:43

<pablomartin> the pr fixes it with the 2nd commit... ?

109

17:44

<lightlike> now that we added the functionality to check whether it's empty for a specific network, we can load just the fixed seeds from networks that we need

110

17:44

<lightlike> pablomartin: yes, correct

111

17:45

<LarryRuane> and just to be clear, "load" means add to our addrman list? Or does that not get persisted to disk (is it memory-only)?

112

17:45

<LarryRuane> *those not get persisted

113

17:46

<lightlike> but this is not the only (or even the main reason) for this PR: the long-term plan is to change the automatic connection logic wrt networks, so the last 2 questions are about that.

114

17:46

<lightlike> Why would it be beneficial to have an outbound connection to each reachable network at all times? Why is the current logic in ThreadOpenConnections insufficient to guarantee this?

115

17:47

<LarryRuane> I'm a little unsure on this first question, but is the benefit to make eclipse attacks more difficult? your peers are more spread out

116

17:47

<lightlike> LarryRuane: yes, load means to load it into AddrMan. While AddrMan is in-memory, it gets persisted to disk (peers.dat) regularly, and in particular before we shutdown

117

17:47

<codo> I think it would be beneficial because we would have recent good addresses for all networks.

118

17:47

<roze_paul> Intuitively, a node would be more difficult to attack (DoS, eclipse)

119

17:48

<lightlike> yes, that would be the selfish reason. there is another, non-selfish one

120

17:49

<LarryRuane> hmm.. so that users of only one not-very-much-used network, like I2P, get some peers? not sure

121

17:49

<codo> Others would have info about networks unreachable for them?

122

17:50

<lightlike> it also helps the sub-networks stay together. if everyone used -onlynet=X for their preferred network, bitcoin would split into parts. So it's important to have nodes that are on multiple networks, and I think it makes sense to help those "volunteers" to actually be connected to all of the supported networks at the same time

123

17:51

<lightlike> so that's the second part of the q: Why is the current logic in ThreadOpenConnections insufficient to guarantee this?

124

17:51

<lightlike> i.e. how do we currently choose outbound peers with respect to different networks?

125

17:52

<LarryRuane> oh so in other words, without this, there's more of a chance of an actual chain split?? (if so, that's a great reason!)

126

17:52

<codo> It doesn't exclude the situation where there there are no peers for a certain network.

127

17:53

<codo> Just doesn't check for it.

128

17:54

<LarryRuane> "... insufficient to guarantee this?" ... I'm unsure about this.. if the config includes any `-connect` options, then of course it's not guaranteed, but I'm sure there are other reasons

129

17:54

<lightlike> codo, yes: so what would happen if we, for example had 10k addresses for clearnet and only 1000 for I2P in addrman?

130

17:54

<LarryRuane> codo: that makes sense!

131

17:54

<codo> lightlike: very small chance for an I2P address to get selected

132

17:55

<lightlike> yes, exactly!

133

17:55

<LarryRuane> oh we choose randomly among those 11k, so chances are pretty good that they'll all be clearnet!

134

17:55

<roze_paul> can't check the # of peers in the subnets if it can't count the # of peers in each subnet, and now we are able to count the peers, thx to this PR

135

17:55

<michaelfolkson> LarryRuane: Just a more fragmented network, not increased risk of chain split right? What's your chain split scenario?

136

17:55

<lightlike> so it's mostly random, without any management currently.

137

17:56

<lightlike> michaelfolkson: yes. but if it's actually 100% fragemented (which is definitely not the case currently), and some minors are on different networks, it could also lead to a chain split in theory

138

17:56

<lightlike> *miners, not minors, haha

139

17:57

<LarryRuane> michaelfolkson: I guess I was thinking if a bunch of nodes, including miners (pools I guess) were on tor (or whatever) only, and another group of nodes (including miners) were on ipv4/6 only, then they could extend the chain separately?

140

17:58

<michaelfolkson> lightlike: I wonder if they are actually on different networks (I agree they should be)

141

17:59

<lightlike> michaelfolkson: I would suspect all miners are on clearnet, the latency of other networks is too low, can't afford to wait several seconds more for a new block.

142

17:59

<LarryRuane> yes good point

143

17:59

<LarryRuane> so not a problem in practice

144

17:59

<lightlike> so the next planned steps would be to add logic to the connection making process to have at least one connection to each reachable network - and this PR prepares that

145

18:00

<lightlike> time's up - thanks everone, great discussion!

146

18:00

<lightlike> #endmeeting