Add DISABLETX message for negotiating block-relay-only connections (p2p)

https://github.com/bitcoin/bitcoin/pull/20726

Host: mzumsande  -  PR author: sdaftuar

The PR branch HEAD was 28df701eb1 at the time of this review club meeting.

Notes

  • On block-relay-only connections (introduced in PR #15759), transaction and address relay are disabled. Currently, Bitcoin Core nodes per default make two outbound block-relay-only connections in addition to 8 regular outbound connections.
    Block-relay-only connections have been subject of multiple review club sessions, among them #15759 on the PR introducing them.

  • Currently, block-relay-only connections are established indirectly:
    • When making an outbound block-relay-only connection, a node sets the boolean flag fRelay in the version message to false.

    • fRelay (introduced in the context of BIP37) does not imply that transactions cannot be sent for the entire duration of the connection - in its original use case with BIP37, relay of transactions can be activated later on.

    • fRelay=false is also used in -blocksonly mode, a low-bandwidth option in which a node does not want to receive transactions from any peer, but does participate in address relay.

    • Therefore, nodes currently don’t have a notion which of their incoming peers see the connection as block-relay-only and don’t have any logic attached to it.

  • PR #20726, accompanied by the new BIP proposal BIP338, introduces the new p2p message disabletx for block-relay-only connections, which makes it explicit that no messages related to transaction relay should ever be exchanged over the duration of the connection.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. What are the benefits of introducing the disabletx message? What are the downsides, if any?

  3. When a node makes an outgoing block-relay-only connection, it will send a disabletx message after this PR. Will there be other changes in behavior for the sender beyond this (if yes, which ones)?

  4. When a node receives both fRelay=false and a disabletx message from an incoming peer, will it behave differently after this PR? If yes, how?

  5. Earlier discussions in the PR revolved around the interaction of disabletx and address relay. Why is it, after PR #21528, no longer necessary to change any address-relay related code in this PR?

  6. The new m_disable_tx field of the Peer struct is of type std::atomic<bool>. Why is an std::atomic used here instead of a simple bool, or a bool guarded by a lock?

  7. There have also been suggestions to achieve some of the goals of this PR without introducing a new message, as in PR #22778. Do you prefer an indirect approach to the explicit approach of this PR?

Meeting Log

  117:00 <lightlike> #startmeeting
  217:00 <lightlike> Hi!
  317:00 <emzy> hi
  417:00 <stickies-v> hi
  517:00 <tarun> hi
  617:00 <svav> Hi
  717:01 <tim1> hi
  817:01 <effexzi> Hi
  917:01 <kouloumos> hi
 1017:01 <larryruane> hi
 1117:01 <sipa> hi
 1217:01 <b10c> hi
 1317:01 <lightlike> Welcome to PR Review Club - today's session is about PR 20726 - Add DISABLETX message for negotiating block-relay-only connections
 1417:01 <lightlike> https://github.com/bitcoin/bitcoin/pull/20726
 1517:01 <lightlike> Before we start, is anybody here for the first time?
 1617:01 <michaelfolkson> hi
 1717:02 <glozow> hi
 1817:03 <lightlike> Seems to be not the case - who had the chance to review this week's PR (y/n)?
 1917:03 <stickies-v> 0.5y
 2017:03 <tarun> y
 2117:03 <emzy> n
 2217:03 <larryruane> 0.5
 2317:03 <sipa> y, a long time ago
 2417:04 <michaelfolkson> 0.5y
 2517:04 <jnewbery> hi
 2617:04 <lightlike> cool - and what was your initial impression?
 2717:05 <stickies-v> seems very sensible, but also that I think it maybe would've been nice to have fRelay be more expressive as an integer so we could capture more states, including the one expressed by disabletx
 2817:05 <stickies-v> (but that's always easy in hindsight)
 2917:06 <tarun> I have perhaps a background question--is block only an all or nothing approach--that a node is block only for all its outbound connections?
 3017:07 <glozow> it always seemed like a bit of a hack to be using fRelay to control tx relay, since it's a bloom filter-related field
 3117:07 <sipa> A node can run in blockonly mode, where it just doesn't ask for individual transactions from peers at all.
 3217:07 <sipa> That's different from block-relay-only connections, which are on a per-connection basis.
 3317:08 <lightlike> yes, it sounds very similar, but is a different thing
 3417:08 <tarun> ok-that distinction is helpful-thanks
 3517:08 <lightlike> ok - let's move to the first question:
 3617:08 <michaelfolkson> tarun: If I understand your question the answer is no. A node may be block only for some of its peers and a normal peer for other of its peers
 3717:08 <lightlike> What are the benefits of introducing the disabletx message? What are the downsides, if any?
 3817:08 <glozow> that's perhaps the most-asked question in pr review club
 3917:08 <sipa> michaelfolkson: Then it's not blockonly. Blockonly means it has no mempool at all.
 4017:09 <larryruane> from the review club notes, "... and don’t have any logic attached to it" -- can someone elaborate on what that means?
 4117:09 <stickies-v> I think the benefit is mostly altruistic, in that your peer can optimize their local state to not have to prepare for sending you transactions in the future (e.g. keeping sketches with erlay)
 4217:09 <sipa> The purpose of blockonly mode is different from the purpose of block-relay-only peers.
 4317:09 <glozow> sipa: has mempool, but ignores incoming transactions that aren't from whitelisted peers, no?
 4417:09 <stickies-v> But there's also a small selfish benefit, in that you'd not have to receive other messages (e.g. ADDR) anymore after sending DISABLETX, so less bandwidth usage
 4517:10 <lightlike> stickies-v: yes to both!
 4617:10 <sipa> glozow: Fair, with whitelisted peers that forcibly give you transactions you could still have a mempool (though that's kind of weird)... the point of blockonly mode is just reducing bandwidth because you don't care about having a mempool.
 4717:10 <michaelfolkson> Blockonly versus block-relay-only oof. Ok that is a distinction I wasn't making
 4817:12 <sipa> stickies-v: Is erlay that related? If erlay isn't negotiated with a peer, no sketches have to be kept for it, whether it's disabletx or not.
 4917:12 <lightlike> yes, every default-configured node today has 2 block-relay-only connections, so it's probably much more common than "-blocksonly" mode.
 5017:12 <larryruane> Is it counterintuitive naming for DISABLETX to control addr messages?
 5117:13 <stickies-v> sipa ah yes, good point. Then I'm not sure what other local optimizations could currently be made by knowing that you won't have to send transactions in the future? Are there any?
 5217:13 <sipa> Yes, all other tx related data structures (e.g. the set of recently announced transactions for deduplication etc).
 5317:14 <sipa> (if that's still the case; I saw something about doing that lazily)
 5417:14 <sipa> My understanding may be a bit outdated.
 5517:14 <stickies-v> That's compact block related, right? Anticipating which tx's your peer would already have in mempool as to know what to include in blocks you send along?
 5617:14 <michaelfolkson> To go from a block-relay-only peer to a normal peer requires a disconnection and new handshake?
 5717:15 <glozow> not compact block related, but not announcing something that your peer already announced
 5817:15 <jnewbery> stickies-v: anything in the TxRelay struct in https://github.com/bitcoin/bitcoin/pull/22778/files can be omitted if transactions don't need to be relayed over the connection
 5917:15 <lightlike> yes, there is the TxRelay struct in net.h with different objects (notable a bloom filter) that is not needed on a block-relay-only connection
 6017:16 <sipa> I think the rationale is mostly knowing that a connection will never need these datastructures, which could help with e.g. deciding how many connections can be supported given local resource constraints.
 6117:16 <lightlike> michaelfolkson: yes, that is not something that can be changed during a connection
 6217:17 <lightlike> Any downsides to the new message?
 6317:17 <jnewbery> the biggest saving is the rolling bloom filter m_tx_inventory_known_filter which is >500k per connection
 6417:17 <michaelfolkson> I just wonder if you have a reliable block-relay-only peer and you're struggling for peers maybe you'd want to transition them to be a normal peer
 6517:17 <stickies-v> thx, I hadn't looked at TxRelay yet, will look into it a bit more. I'd assumed the bloom filter related stuff could already be optimized just based on fRelay, though
 6617:17 <jnewbery> stickies-v: you're right, and that's what #22778 does, although it needs rebase and comments addressed
 6717:17 <glozow> bloom filter m_tx_inventory_known_filter is distinct from BIP37 bloom filters fyi
 6817:18 <glozow> oh, facepalm, i misread the message, ignore me
 6917:18 <jnewbery> glozow: right, it's a rolling bloom filter, which is the same data structure we use for various purposes in p2p
 7017:19 <michaelfolkson> Maybe clutching at straws but is it a downside telling your peers what you want from them explicitly a bit of a privacy leak. I'm thinking that is a bit of a push
 7117:19 <sipa> You're already telling them you don't want transactions from them.
 7217:19 <sipa> DISABLETX is telling them you'll never want transactions from them.
 7317:21 <lightlike> I wonder if having a separate protocol message for each boolean of information you want to exchange during feature negotiation is ideal design in the long run - it could lead to a zoo of messages over time that might be hard to keep track with
 7417:21 <sipa> There aren't that many pieces of information to be negotiated really... transactions, blocks, and ip addresses.
 7517:22 <sipa> I guess headers is another one.
 7617:23 <lightlike> yes, currently the boolen-like messages are WTXIDRELAY, SENDADDRV2 and SENDHEADERS afaik, but there is probably potential for growth
 7717:23 <sipa> lightlike: If we had perfect foresight about what we'd ever want the protocol to be, I'm sure something cleaner/more efficient could be chosen, but the "new protocol version number to add support for X; new message to negotiate the feature" has proven pretty flexible so far.
 7817:23 <sipa> Don't forget that most P2P connections are pretty long-lived (hours to days/weeks or more even), so a few messages back and forth once at the start isn't a huge cost.
 7917:23 <sipa> https://www.dsn.kastel.kit.edu/bitcoin/ for numbers.
 8017:24 <lightlike> next question:
 8117:24 <lightlike> When a node makes an outgoing block-relay-only connection, it will send a disabletx message after this PR. Will there be other changes in behavior for the sender beyond this (if yes, which ones)?
 8217:24 <jnewbery> lightlike: I share that concern. I think it's tractable now, but it gets more complex as the number of features increases and there are features that can only be negotiated if other features have already been enabled (eg erlay requires wtxidrelay).
 8317:25 <lightlike> sipa: yes, true. I'm not sure if the protocol would be more simple if these weren't separate messages but bits in one message containing a bitfield or something like this.
 8417:28 <stickies-v> lightlike I don't think it's implemented in this PR yet, but my understanding is that in follow up PRs the sender of DISABLETX would also have to stop sending other messages like ADDR in order not to be disconnected?
 8517:29 <stickies-v> or is that logic implemented already?
 8617:29 <lightlike> stickies-v: actually, that won't be necessary, we'll get to that in a later question. addr relay works well together with this PR and BIP338 "out of the box"
 8717:30 <sipa> The BIP only states that it is recommended not to send ADDR/... messages to DISABLETX peers, but without any hard rule.
 8817:30 <lightlike> but yes, there are no other changes in behavior for the *sender*.
 8917:31 <lightlike> moving to the receiver:
 9017:31 <lightlike> When a node receives both fRelay=false and a disabletx message from an incoming peer, will it behave differently after this PR? If yes, how?
 9117:31 <Kaizen_K_> it wont send any transactions to that incoming peer?
 9217:32 <stickies-v> It will disconnect such peers that send now invalid messages like GETDATA
 9317:33 <lightlike> Kaizen_K_: not sending transaction is something it would already be done before, just using the old "fRelay=false" information, so it's not changed with this PR.
 9417:33 <Kaizen_K_> ty
 9517:34 <lightlike> stickies-v: correct! If you first send a disabletx message and then don't follow the BIP by following up with tx-related messages, you'll get disconnected.
 9617:35 <lightlike> note that the part where the receiver doesn't initialize the TxRelay objects is not part of this PR, it would probably make sense as a follow-up PR.
 9717:36 <michaelfolkson> So a possible downside to this DISABLETX approach is it formalizes the relationship as a block relay only and to get out of it you need to redo the handshake. Current behavior is a peer can later request to receive transactions again using filterload without redoing the handshake?
 9817:37 <sipa> Filterload is all but dead.
 9917:37 <sipa> It requires actively opting in to BIP37 support.
10017:38 <michaelfolkson> Ok gotcha, thanks
10117:38 <lightlike> I think the old behavior is not disallowed: If you intend to use BIP37, just don't send a disabletx message, and everything works as good (or bad) as before
10217:38 <sipa> Indeed.
10317:38 <lightlike> Next question:
10417:39 <lightlike> Earlier discussions in the PR revolved around the interaction of disabletx and address relay. Why is it, after PR #21528, no longer necessary to change any address-relay related code in this PR?
10517:40 <Kaizen_K_> I think that peer #21528 filters peers that have no other peers? Is that correct?
10617:41 <michaelfolkson> lightlike: Is this the deferring initialization of m_addr_known?
10717:41 <lightlike> michaelfolkson: yes, correct!
10817:44 <lightlike> It basically changed the way address relay is setup: if you make an outgoing block-relay-connection, you know that you don't want addr related messages, and just don't send any such messages.
10917:44 <lightlike> On getting an inbound connection, you'd only start sending addr related address after having received at least one.
11017:45 <michaelfolkson> That is a neat side effect of #21528. Presumably it was deliberate with this advantage in mind?
11117:45 <lightlike> So, everything plays nice already on a block-relay-only connection, without a need to disable something.
11217:45 <Kaizen_K_> Can anyone explain what addr messages are please? Can't find anything on google.
11317:45 <stickies-v> https://en.bitcoin.it/wiki/Protocol_documentation#addr
11417:45 <lightlike> That's what https://github.com/bitcoin/bitcoin/pull/20726#issuecomment-1006005523 refers to.
11517:45 <sipa> Kaizen_K_: They relay IP addresses of other nodes.
11617:45 <Kaizen_K_> ty
11717:47 <svav> I googled Bitcoin addr message and found this https://en.bitcoin.it/wiki/Protocol_documentation#addr
11817:47 <sipa> Also see BIP155, for its successor, ADDRV2.
11917:48 <lightlike> there was also a review club meeting on PR 21528 that I forgot to link: https://bitcoincore.reviews/21528
12017:48 <lightlike> Next question is more C++-technical:
12117:48 <lightlike> The new m_disable_tx field of the Peer struct is of type std::atomic<bool>. Why is an std::atomic used here instead of a simple bool, or a bool guarded by a lock?
12217:50 <glozow> for thread safety
12317:50 <sipa> A bool guarded by a lock would also be thread safe, but much slower on most platforms.
12417:51 <glozow> and more code
12517:51 <michaelfolkson> When would you use a bool guarded by a lock if it is much slower? :)
12617:52 <lightlike> yes, I guess the basic thing is that the "Peer" struct is not guarded by cs_main as a whole, so something needs to be done for each of the members separately
12717:52 <sipa> You wouldn't.
12817:52 <sipa> But perhaps you need a lock that guards more than just a bool.
12917:52 <sipa> An atomic is only usable if you want a guarded data structure that only contains a bool, and nothing more.
13017:52 <lightlike> michaelfolkson, sipa: could there be situation where this makes sense because you care about enforcing a lock order?
13117:53 <sipa> If lock order is relevant, it means you're not guarding just a bool.
13217:53 <larryruane> it seems this boolean is set (to true) from only one thread (message processing), but read from other threads, and locking is needed even in this case
13317:54 <sipa> An atomic<bool> is equivalent to a bool with a lock that's only held at the innermost level.
13417:54 <stickies-v> sipa https://en.cppreference.com/w/cpp/atomic/atomic seems to also be defined on other types than bool?
13517:54 <glozow> for every read/write to an atomic_bool, the lock will be taken right before and released right before that operation is over, so you wouldn't be taking multiple locks at a time in different orders?
13617:54 <larryruane> can there be `atomic<someLargeStruct>`? we seem to see it only for simple types
13717:55 <sipa> stickies-v: Oh, sorry, I didn't mean to imply it just worked for bools.
13817:55 <sipa> larryruane: If there was, it'd almost certainly be implemented using a lock :D
13917:55 <larryruane> yes, just wondering if it's even allowed by the compilter
14017:55 <lightlike> could it be that operations such as setting a bool aren't atomic anyway? Or is this implemenation-specific and we jsut want to be on the safe side?
14117:56 <sipa> larryruane: Pet peeve of mine: the compiler is irrelevant. The question you should ask if is if it's permitted by the *language* (and so far, I think the answer is no, but it could be expanded in future language versions of course).
14217:56 <willcl_ark> Does a mutex also not get locked and unlocked each with an atomic operation, i.e. 2 atomic operations minimum?
14317:57 <sipa> willcl_ark: In practice, yes.
14417:58 <lightlike> larryruane: I think that std::atomic is currently limited to just few small, "integer-like" types. I don't think it exists for double, for example.
14517:58 <sipa> lightlike: Unguarded concurrent access is undefined behavior.
14617:58 <sipa> It might work, it might also send all your BTC in your wallet to me.
14717:59 <stickies-v> Now that's an innovative source of dev funding
14818:00 <lightlike> ok, time's up - thanks a lot everyone!
14918:00 <sipa> In practice, x86 has a fairly strong consistency model, so it's not unreasonable that the resulting compiled code actually works there (but it certainly might not). On other hardware (non-Mac ARM, in particular) non-atomic operations are pretty likely to kill you.
15018:00 <lightlike> #endmeeting