We will discuss both PR 15558
(merged) and PR 16939 (open)
in this week’s PR review club. Both are small code changes, so should be quick
to review.
Notes
When bitcoind is started, it tries to connect to peers from its addrman
database. Those new connections are opened in
CConnman::ThreadOpenConnections(), which is called in CConnman::Start().
If the addrman database is empty (for example, the first time that bitcoind
is run), or if bitcoind is unable to successfully connect to two peers, then
it will query DNS seeds to get addresses of candidate peers to connect to.
Querying DNS seeds is done in a separate thread, threadDNSAddressSeed, which
is also started in CConman::Start().
For completeness, if the DNS seed queries are unsuccesful, then bitcoind
will fall back to connecting to a hard-coded list of seed nodes. This
fall back functionality is only run as a last resort if all DNS seeds are
down (eg if there is an attack on the DNS seed infrastructure).
In February, Issue 15434 was
opened, which reported that DNS seeds were being queried too soon and too
frequently. From that issue: “Despite being running most of the time, and
having a database of tens of thousands of peers, my node seems to query the DNS
seeds each time I restart it, which doesn’t seem ideal from a privacy
perspective”.
PR 15558 (which was merged
in time for V0.19) changed the ThreadOpenConnections() behaviour to only
query three DNS seeds at a time (rather than all at once). According to the
author: “This reduces the amount of information DNS seeds can observe about the
requesters by spreading the load over all of them.”.
After that PR was merged, AJ Towns left a
comment:
“I think there’s a bug here: if this loop is hit with addrman.size()==0 and
seeds_right_now==0 it will then query the first seed, and set
seeds_right_now==-1, at which point this condition will keep failing on future
loops, preventing a delay between batches of 3 seeds, and preventing early
exit.”.
PR 16939 changes the DNS
seed behaviour once again:
if there are 0 entries in addrman or -forcedns is set, it will query
the DNS seeds immediately;
if there are fewer than 1000 entries in addrman, it will query DNS seeds
after 11 seconds;
if there are more than 1000 entries in addrman, it will query DNS seeds
after 5 minutes;
if there are still no entries in addrman 6 minutes after start up, then
bitcoind will try to connect to the hard-coded seed nodes.
<lightlike> jnewbery: i started a node with the PR with and without peers.dat and also added some more precise logging to see how many IPs the different DNS seeds sent us.
<lightlike> jnewbery: in one run, it took me like 5 minutes to get 2 peers on mainnet, which I found a bit too slow considering I had ~6k entries in addrman. Might just be an outlier though.
<jnewbery> one of the challenges for this PR is that we don't have any way to do integration testing of these changes - we can't manipulate the peers.dat file or set up DNS servers in our functional test framework
<fanquake> Now that there's been more some PR review coming from the review club, this sort of "how did you actually review PRs" discussion is definitely the most interesting to me. Especially from anyone that has been ACKing PRs.
<aj> lightlike: (if you had net debug enabled, "trying connection .* lastseen=" in your logs in between when you started the test and when you got two peers should tell you)
<lightlike> aj: the 9th peer was the second one that worked. I think the issue is that the first peer I connected to sent me some new blocks (my node was still in final stages of IBD), and while they were being validates there were some ~30s breaks in which no new connections were attempted
<lightlike> re q4: hard to say if bug or intentional, but I'm not sure if it is important if a peer with an empty addrman asks all DNS seeds or just 3 of them.
<aj> if we had lots of DNS seeds (or lots of full nodes coming online) it would seem a bit spammy to always query all of them to me. but we don't have either of those things...
<zenogais> I wasn't able to review these as much as I would have liked. Seems like they need deeper review / testing since they're in a part of the codebase that isn't easily testable atm.
<jnewbery> I know that carldong was hoping that better separation of the net layer would make focused testing easier. I'm not sure how far he's got with that
<jnewbery> zenogais: one potential difficulty with using a debugger with this functionality is that there are a bunch of threads with timers interacting with each other. Stepping through one thread doesn't capture all the behaviour
<lightlike> One thought I had is that that while querying the DNS seeds in almost every startup is obviously bad for decentralisation, it is also a defense against eclipse attacks.
<lightlike> So if someone manages to manipulate our addrman (like in the Erebus or Heilman paper), the DNS seeds won't counteract that after this PR is merged.
<fjahr> concerning peers.dat: we could make it easier to manipulate files in the node's datadir with some helper methods, i think this could be helpful for other tests
<jnewbery> lightlike: I think that's a good point. If an adversary can take over your addrman, then I think the only way out of that is to query the DNS seeds or the hard-coded seed nodes.
<jnewbery> lightlike has already mentioned that not querying DNS seeds could potentially be bad for robustness/redundancy (because reducing the use of DNS seeds could *potentially* make us more vulnerable to having our addrman taken over)
<fjahr> what i took away from the discussion: higher privacy because less other nodes are pinged at startup but potentially lower usability because startup might take longer
<zenogais> Not sure if it does. But it sounds like the order was originally fixed, and 15558 switched to it to random order in addition to reducing the number of seeds queried.
<jnewbery> the problem that they're trying to solve is not leaking information to _all_ of the DNS seeds about when you're staring a node. Changing the ordering doesn't improve that
<sprawl-unsolved> What's not clear to me is if PR 15558 was modified to attempt to get a non homogenous set of DNS seeds to attempt to prevent eclipse attacks
<jnewbery> sprawl-unsolved: I think 3 was simply chosen to mitigate against a single rogue DNS seed being able to eclipse new nodes. Issue 16938 is a very interesting discussion, but it's not the only reason you'd want redundancy in the DNS seeds you query
<jnewbery> zenogais: not that I'm aware of. What specifically do you mean when you say 'testing around multi-threaded code'? Do you have something specific in mind?
<zenogais> jnewbery: The question is mostly motivated since you spoke about the difficult of testing/observing these threads in isolation because of their interactions
<jnewbery> I don't think it's that difficult to write integration tests, or at least it's probably easier than writing integration tests for some of the other p2p behaviour
<jonatack> zenogais: re folks working on this, I proposed these PRs out of personal interest and am beginning to look for funding to work on these areas.
<jnewbery> ok, let's wrap it up there. I'll try to get next week's notes and questions up sooner than last time so you all have a bit more time to review and test