Although these three purposes share the same network, they have different design goals and properties. For example, transaction relay is optimized for a combination of redundancy/robustness as well as bandwidth minimization, while block relay is optimized to minimize delay.
There’s no reason these different purposes couldn’t be split onto different networks, and in fact there already are specialized networks for block propagation such as FIBRE and Blockstream satellite, and for transaction relay such as TxTenna.
Currently Bitcoin Core treats all peers as full network peers (tx, block and addr), unless you start your node in ‘blocksonly’ mode, in which case all peers are block/addr peers.
The defaults for numbers of connections are 8 outbound and up to 125 total (ie up to 117 inbound connections if the node is a ‘listening’ node and is reachable).
This PR proposes adding 2 additional blocks-only nodes, which would not relay txs or gossip addrs. The default connections would therefore be 8 full outbound, 2 blocks-only outbound, and up to 115 inbound.
There are two main data structures that handle peer state: CNode (defined in net.h and covered by cs_vNodes) and CNodeState (defined in netprocessing.cpp and covered by cs_main).
Roughly speaking, CNode is concerned with the connection to the peer, and CNodeState is concerned with application state of the peer. However, there is still some application state contained in CNode for historic reasons. This should be moved out of CNode eventually.
One change in this PR is to move the tx relay inventory state into a separate structure (TxRelay) and only initializing it if the peer is a tx relay peer.
Questions
What problem is this PR addressing?
Why is the tx relay inventory state moved into its own structure?
How do we signal a ‘blocks-only’ connection on the P2P network?
Why is the P2P behaviour of this PR difficult to test in the functional test framework?
What does aj point out in his review comment that all other reviewers missed?
How should we handle peers that send us tx data after we’ve requested a blocks-only connection?
<jnewbery> - weekly reminder that PR review club exists to help you get reviewing on github. I encourage you all to leave review/test comments on the actual PR!
<michaelfolkson> On that I was thinking of maybe starting this 15 minutes early each week for those who want to chat about basic stuff without wasting John's time
<dergigi> I built it and looked through the code and the PR; don't have any comments really, still have a lot to learn and wrap my head around; wasn't sure how to test it properly; network tests from PR 14210 would be great I guess
<dergigi> This PR relates to the Erebus Attack as well, right? At least it blocks/conflicts with PR 16702 which implements one of the proposed countermeasures?
<michaelfolkson> Basic question on the peer management. You initially connect to the DNS seeds but then these connections are dropped and that's where you're at risk of eclipse attack (assuming you haven't manually connected to a peer that you trust)?
<michaelfolkson> <provoostenator>: But if you maintained your connection to the DNS seeds then the transactions/blocks they provide would be different to the attacker's?
<provoostenator> michaelfolkson: you don't maintain that connection; you only poll them once and only if you can't find a peer by yourself quickly enough (11 seconds from boot)
<jnewbery> lightlike: I think there's a tradeoff between bandwidth usage and privacy. If you're only connecting to a subset of the network and relaying txs to them, then that leaks information about your topology
<provoostenator> michaelfolkson: when your node starts it checks peers.dat for a list of peers it already knows about and tries to connect to some of them
<ariard> lightlike: was thinking about this yesterday and intuitively would say as tx conflicts are solved by blocks you can always provoke conflicts between all you peers as it's unconfirmed data
<lightlike> ariard: yes, but do these conflicts necessarily have to lead to an attacker learning the entire network topology as described in the txprobe paper?
<jnewbery> nehan: yes. It's the bloom filters that are expensive. I think without those, it wouldn't really be worth changing this (for memory saving - it might be worth it to make the code clearer)
<ariard> lightlike: don't know enough about mempool, but would say if peers have different tx dependcy graph, it's going to modify the way they relay tx (you may have other heuristics not yet discovered using RBF or CPFP)
<jnewbery> ok, so the PR moves the (potentially memory expensive) tx inventory state to its own struct, and only initializes that if the connection relays txs
<aj> i don't think our bloom filters are easy to reinitialise; if you've got a unique_ptr like pfilter you can; but if you have it in place you define the number of elements and hence memory usage at declaration
<aj> nehan: when initialising the object it allocates the memory, with nElements = 50k; would have to conditionalise that or something to be able to save memory
<lightlike> jnewbery: that some peers might ignore our version message and still send us INVs, in which we case we would have sent TXs even if the connection is blocks-only
<aj> michaelfolkson: the simulation looks at how many outbounds resulted in the p2p network remaining connected if you assume all the other connections everyone has are useless (because the tx stuff has allowed them to be attacked, maybe)
<aj> lightlike: i thought about it then though "no, suhas wouldn't get that wrong" so ignore it; then tried it live on mainnet and about the third run saw it
<jnewbery> nehan: sorry, I misunderstood your question. I thought you were talking about the `relay` bool in the VERSION message, not the new tx_relay struct
<nehan> jnewbery: yes, i'm curious why check node->m_tx_relay == nullptr (assuming the reader knows this is only non-null when the node supports tx relay) instead of directly including a separate bool to indicate whether or not the node supports tx relay. i guess it's a code style thing.
<jnewbery> nehan: I think it's fine to check based on the non-nullness of the pointer. Adding an additional bool which essentially would be equivalent to saying 'this pointer is nun-null' seems unnecessary
<jnewbery> you could imagine a future change where someone adds a way to downgrade a connection to blocks-only, sets the pointer to null and forgets to update the bool, as one example of why you wouldn't want that redundancy
<jnewbery> jonatack: (sorry, dropped your question about spy nodes earlier). I noticed that the IP address for the node that was doing this was on Greg's spy node list: https://people.xiph.org/~greg/banlist.cli.txt
<jnewbery> and I spent a while digging into how Bitcoin Core serializes/deserializes VERSION messages (and in particular booleans - `relay` in VERSION is the only boolean we serialize on the P2P network) and couldn't find any bugs
<jnewbery> jonatack: (re:ACKing) that shouldn't discourage you from reviewing. A comment like "code looks correct to me, but I don't feel confident enough about behaviour to give an ACK" is a perfectly helpful contribution
<jnewbery> lightlike: there is benefit to having both tx and block relay for the same peer, eg compact blocks works under the expectation that the mempool of the peers is similar
<aj> lightlike: also the overhead of relaying blocks if you're already relaying tx's seems pretty small, and the more redundancy block relay has the better
<jnewbery> I skipped a question, but I think we already covered it: Why is the P2P behaviour of this PR difficult to test in the functional test framework?
<jonatack> michaelfolkson: the difficulty of setting up functional tests to simulate p2p behavior, we lack p2p simulation frameworks... Suhas, Gleb, Jeremy Rubin, Giulia Fanti, have IIRC all mentioned the need for them
<jkczyz> jnewbery: Is there any planned refactoring for the network code? It seems quite unwieldy/complex and perhaps increasingly so with each change.
<jnewbery> michaelfolkson: that's what the functional tests are doing, but the framework is missing some pieces that would allow us to fully test P2P behaviour
<aj> jnewbery: shouldn't "addnode" be able to construct whatever shape of network we want? maybe adding/removing net_permissions -- so 127.0.0.1:* defaults to LOCAL_PEER perm but that can get dropped with some arg to addnode, maybe?
<jnewbery> let's wrap it up there. What did people think of this one? P2P is quite tricky and there's a lot of required context so I was worried that it might be too much, but I thought this was good.
<aj> jnewbery: i guess, (a) i don't get why this is hard -- i mean, it's work, but it doesn't seem really tricky, but maybe i'm missing somethng? (b) not clear to me whether it makes more sense to setup the environment (peers.dat) and have bitcoind do it's normal resolution, or have rpc commands that explicitly set stuff up makes more sense, or maybe it makes sense to do both
<aj> ariard: read the code, read the PR comments, re-read both, run the tests, find something that doesn't make sense and try to make sense of it, repeat; once it kind-of makes sense, run it and see what happens. if things don't quite make sense, maybe add some logging, and run the tests and look through the logs, or run it on mainnet and look through the logs