The PR branch HEAD was 2533ce0 at the time of this review club meeting.
Notes
PR 19031 is a proposed implementation of the BIP155addrv2 message, a new p2p message format proposed in early 2019 by Wladimir J. van der Laan to gossip longer node addresses. The addrv2 message is required to support next-generation Tor v3 Onion addresses, the Invisible Internet Project (I2P), and potentially other networks that have longer endpoint addresses than fit in the 128 bits/16 bytes of the current addr message.
To ease review, PR 19031 was converted to a roadmap and separated into a series of smaller PRs. The first one in the series, PR 19351 “test: add an edge case test for CSubNet”, has been merged. PR 19360 “net: improve encapsulation of CNetAddr” is the second PR in the series and currently under review.
The addr message relays connection information, including IP addresses, for peers on the Bitcoin network. addr uses IPv6 addresses; IPv4 addresses are provided as IPv4-mapped IPv6 addresses. More information here.
The BIP155 addrv2 message would allow not only longer addresses, but also variable-length ones for future extensibility.
Bitcoin supports Tor v2, whose addresses are 10 characters long. Once encoded to the 16-byte onioncat space, they fit nicely in the 16 bytes needed for an IPv6 address. Tor v3 (meta issue here), aka Next Gen Onion Services, has been available since 2017. Tor v3 provides much better security and privacy using longer, 56-character addresses. Using Tor v3 on the Bitcoin network therefore requires implementation of addrv2.
PR 19031 is a fairly large PR. For the purposes of this PR Review Club, here is a suggested approach for study and review:
Review PR 19360 (this is also the first commit of PR 19031).
Read BIP155. There is ongoing discussion on BIP155 happening at BIPs PR 907. Because of this, please review BIP155 at any of the links in these notes as they reflect the current status of the BIP.
See this comment in BIPs PR 907 about how to signal support for addrv2. The current decision can be seen here.
Read through all of the conceptual comments in 19031.
Skim commits 2-4 of PR 19031. The goal should be to get context for what future PRs will cover and to see how the implementation of BIP155 matches the specification.
What are the differences between addr and the new addrv2?
Do you agree with how PR 19031 is being split up into smaller PRs? Would you rather the “reviewable” PRs be larger and include more commits? What are the advantages and disadvantages of this choice?
Do you agree with how addrv2 support is being signaled? What are some alternatives?
Why is Tor v3 being considered?
What are some other protocols that could be added here? Are they all compatible with addrv2, or will we need an addrv3 in the future?
<tattered> addr currently only supports 16 byte addresses which limit which endpoints nodes can connect to expanding addrv2 to 32 bytes enables tor v3 addresses and I2P plus yet-reserved addresses
<vasild> tattered: right, notice that in addrv2 the address size is variable, so it can be 32 for torv3, or 363 for some-fancy-network, or just 4 for IPv4 :)
<sipa> the question is "does peers.dat permit storing anything else than IPv6 addresses", and the answer is no, because that's what addresses were in bitcoinland up to now :)
<troygiorshev> Do you agree with how PR 19031 is being split up into smaller PRs? Would you rather the “reviewable” PRs be larger and include more commits? What are the advantages and disadvantages of this choice?
<sipa> tattered: i believe so; i tried adding I2P support in 2012, but gave up due to it being very hard/impossible to pack into some ipv6 address space
<dongcarl> the prelude commits might not be accepted on their own as pure-refactors, but does stand in this case given that they simplify later commits in the PR
<michaelfolkson> Personally I like the smaller PRs approach and it works better for PR review clubs. I think Gleb's argument was that there should be cleaner mapping between a BIP and a PR
<jnewbery> although it's getting better, we have a codebase organization that means PRs very often conflict. The larger the PR, the more often you need to rebase, which is painful for authors and reviewers
<tattered> sipa wumpus is it the case that peers.dat only stored ipv6 or that 16 bytes was the cap. meaing v2 onions are padded too in peers.dat, correct?
<wumpus> tattered: it should be backward compatible (e.g. new code can still read old peers.dat files), there is no requirement to be compatible the other way around
<vasild> tattered: the serialization code is the same, no matter whether the address is going to the disk or being sent over the network. In the current serialization we represent IPv4 and Torv2 addresses in 16 bytes, encoded as a "fake" IPv6 addresses from some reserved IPv6 networks.
<wumpus> I picked a protocol version bump first, because I saw it as a protocol update, but a lot of people disagreed with this, but I'm okay with signalling it with a message
<jnewbery> the latest BIP says that the sendaddrv2 message should be sent after the verack. That means there's a window between receiving the verack and sending the sendaddrv2 message where the peer will send you addr messages
<wumpus> there is only a limited number of version bits and they're generally used to find peers supporting a certain feature, which is not really relevant here
<dongcarl> michaelfolkson: the bip can be extended to allow for wider addresses, addresses with unknown networkIDs are currently ignored, but there's a sanity check for an absurdly large address
<jnewbery> I think it'd be great if the version message could be extended to include requested/supported features rather than have a new message type for every feature
<sipa> michaelfolkson: i don't think a service bit makes sense; the way to advertize you're listening on some other network... is by sending out addrv2 messages for your IP on that network; it doesn't need some flag to announce it
<michaelfolkson> One of the arguments for new network service bit was SPV clients could find addrv2 nodes by service bit. But you're saying sipa that that would be obvious anyway?
<wumpus> jnewbery: yes, something that uses named extensions instead of needint to allocate a limited number of extension bits would be nice, some things have been proposed in the past IIRC
<vasild> fwiw the current implementation will also work if the sendaddrv2 message (which means "I support addrv2, please send me in that format") arrives in the middle of the communication (not necessary the first message after verack)
<vasild> gmaxwell made an interesting observation at https://github.com/bitcoin/bips/pull/907 -- if a peer forwards (gossips) a network he does not know about (e.g. networkid=0x07, some obscure data with length=123), then that could be a problem if the node which does the forwarding is old and network 0x07 has actually been added with required length!=123. Then that old node will happen to forward
<tattered> michaelfolkson "only care about the one you are connected on" is that right? Why wouldn't a node want to keep the most comprehensive peer list available?
<jnewbery> vasild: right, if there are new restrictions on message validity, we shouldn't ban/disconnect if those restrictions are violated, otherwise we end up banning old peers, and may split the network
<michaelfolkson> tattered: I would guess it depends on your preference. If you only use Tor v3 and aren't interested in any of the alternatives? Others would be interested in alternatives I'd guess
<jnewbery> one interesting thing here is that we gossip addresses without doing any validation of them, unlike transactions and blocks, which we'll only relay if we've validated them
<vasild> yes, this is how the implementation works - it consumes an unknown network id messages and silently ignores them. But if it receives a known-network with wrong length then that is treated as misbehavior - the entire message is dropped (with all addresses) and maybe the peer banned.
<michaelfolkson> tattered: But then this is an argument for not using message during handshake and using new network service bit instead? I didn't understand sipa's earlier point
<tattered> Does the fact that networkid, and length being unspecified have any marginal decrease in network topology akin to txn relay topology? ( I guess we care more about txn-origination privacy more than peer topology)
<sipa> i'm still a bit hesitant to treat known-but-invalid addr entries as misbehavior; i agree there is no transitive partition risk if unknown types don't get relayed... but it's still scary
<sipa> jnewbery: btw, perhaps the current code even relays not-our-network addresses too little (it relays them to 1 peer instead of 2), as it seems very hard to learn about onion addresses today if you're only connected to ipv4/ipv6
<tattered> wumpus Sorry I didn't elaborate properly. "decrease in network-topology *privacy*" meaning if a Eve wants to discover more of peer topology she injects a tag into an unspecified peer netid and address and observes where that tag is relayed by other peers. Though it seems if peers strongly drop netid and lengths they don't know about, this topology won't leak out as easily
<dongcarl> According to the tor-dev mailing list, Tor plans to deprecate v2 with 0.4.4.x (Sept. 15th, 2020) and obsolete it in 0.4.6.x (July 15th, 2021)
<vasild> sipa: there is no strong reason to ban/misbehave a peer who sends us known-network, invalid length address. Other that he sent us something that does not make sense (e.g. IPv4 address with length 5). Maybe make distinction in networks 0x01..0x06 (as currently defined in BIP155) and networks 0x07.. (future ones) - ban on invalid len 1..6 networks and ignore/drop the address on 7.. networks?
<sipa> vasild: possibly; the alternative would mean that numbers 1-6 can be reused too if networkid space gets too crowded (who knows, maybe someone one day finds a use for "set top bit in network id to mean X"...)
<tattered> torv3 fixes some tenuous privacy centralization in that's present in torv2 service infrastructure -- meaning bitcoin nodes running tor v3 addressing cant be deanonymized as easily
<troygiorshev> last question, What are some other protocols that could be added here? Are they all compatible with addrv2, or will we need an addrv3 in the future?
<troygiorshev> wumpus: i may or may not be using this question as an excuse to plug AltNet and discuss networks that don't bind well to sockets, etc. :D
<dongcarl> sipa: Is there any case where 2 torv3 nodes both connect to a third torv3 node, and the third one doesn't tell the 2 initiators about each other?
<troygiorshev> anyways, if you found this PR interesting, you'll probably also find ariard's AltNet proposal interesting! You can find it here: https://github.com/bitcoin/bitcoin/pull/18988
<troygiorshev> wumpus: it's an attempt to make bitcoin's p2p more modular, such that anyone can write "drivers" for their network that can easily be "plugged" into
<wumpus> the only requirement that addrv2 makes is that addresses an be stored in 512 bytes of data, nothing more, even variable-length "route descriptions" would work (if they fit in that length), unless your network is so postmodern that addresses can't be desrcibed at all, I don't see what would be unsuppore
<vasild> dongcarl: "Is there any case where 2 torv2 nodes both connect to a third torv2 node, and the third one doesn't tell the 2 initiators about each other?" -- yes, I think that is possible, but I have to look at the code to confirm.
<wumpus> so I guess protocols with relative addressing ("this is the list of transit sequences for routing decisions from me to you") would not work with this, but gossip in general only works for global address spaces