The PR branch HEAD was d6dfa5977a at the time of this review club meeting.
Notes
PR 21527 continues the work
of better encapsulating net_processing that ajtowns has been tracking in WIP
PR 20758.
The previous PR in the series (PR
21148) moved most of the
orphan processing logic out of net_processing into its own subcomponent,
txorphanage.
Splitting self-contained units of logic into their own classes/translation
units makes it easier to test that logic in isolation. Enforcing a well-defined
interface to the component also makes it easier to reason about how
the code will behave.
This PR does two things:
introduces an m_mutex_message_handling mutex in PeerManager, which guards data
that is used in PeerManager’s message handling. Doing that allows us to remove
the cs_sendProcessing mutex from net.
removes the g_cs_orphans mutex from net_processing and adds an internal
mutex to TxOrphanage, making it responsible for its own locking and thread
safety.
Does this PR change observable behaviour of the node in any way?
What was the cs_sendProcessing mutex responsible for before this PR? When
was cs_sendProcessing introduced? (Hint use git blame and git log -S
"string" to search back through previous commits).
This PR moves one mutex from net to net_processing
(CNode.cs_sendProcessing is replaced by
PeerManager.m_mutex_message_handler) and one mutex from net_processing to
txorphanage (g_cs_orphans is replaced by TxOrphanage.m_mutex). What are
the benefits of moving these global/externally visible mutexes to being defined
internally in their respective classes?
What are vExtraTxnForCompact and vExtraTxnForCompactIt? Why is it ok to
stop guarding them with g_cs_orphans and guard them with
m_mutex_message_handling instead?
This PR removes the Peer.m_orphan_work_set member and replaces it with
a m_peer_work_set map in TxOrphanage. What is the peer orphan work set
used for?
(Bonus question) This PR originally included some behavioural changes in the
way that orphans are handled. Those have now been moved from this PR to
a separate branch
and may be proposed as a follow-up PR. What are those behaviour changes?
<jnewbery> Today we're going to be looking at PR 21527: "Net_processing: lock clean up". Notes and questions are in the normal place: https://bitcoincore.reviews/21527
<jnewbery> I'll be asking questions from https://bitcoincore.reviews/21527 to guide the conversation, but feel free to jump in at any point and ask questions if anything's unclear
<jnewbery> svav: I agree with the clarity part. This PR shouldn't change performance, or maybe you're thinking of work that could be built on top of this PR?
<jnewbery> Currently, the critical paths in Bitcoin Core are mostly single-threaded, and most of the important state is guarded by a single mutex called cs_main
<jnewbery> if we can break that mutex up, it opens up the possibility of having a bit more concurrency, for example, having a validation thread independent from net_processing
<michaelfolkson> And so the (interesting) discussion with vasild was a disagreement on stepping stones towards that goal? There are trade-offs depending on which stepping stones you pick?
<jnewbery> glozow: correct! If there was a separate thread for validation, the net_processing thread could start serving other peers while validation is validating the tx
<jnewbery> svav: great question. The "net" layer is responsible for opening and maintaining connections to peers, and reading/writing messages to the wire. The main class in that layer is CConnman. The "net processing" layer sits above that, deserializes the p2p messages and then maybe passes them up to validation to be validated
<jnewbery> ok, next question. What was the cs_sendProcessing mutex responsible for before this PR? When was cs_sendProcessing introduced? (Hint use git blame and git log -S "string" to search back through previous commits).
<jnewbery> I have a few tools that help me navigate the history of the repo. `git log -S` is one of them. The fugitive plugin for vim is another great tool
<jnewbery> This PR moves one mutex from net to net_processing (CNode.cs_sendProcessing is replaced by PeerManager.m_mutex_message_handler) and one mutex from net_processing to txorphanage (g_cs_orphans is replaced by TxOrphanage.m_mutex). What are the benefits of moving these global/externally visible mutexes to being defined internally in their respective classes?
<jnewbery> Next question. What are vExtraTxnForCompact and vExtraTxnForCompactIt? Why is it ok to stop guarding them with g_cs_orphans and guard them with m_mutex_message_handling instead?
<jnewbery> svav: I think almost all of our locking uses RAII, so it's quite hard to mess up locking. The lock lasts as long as the std::unique_lock stays in scope (and that's wrapped by the LOCK() macro)
<jnewbery> We also have lock annotations now, which give us a bit more confidence, since the compiler enforces that the right locks are held when entering functions/accessing data
<jnewbery> but yes, I agree that if the mutex is not exposed at all, there's no way for a caller to get locking wrong (since it's not responsible for locking)
<sipa> glozow: that's not how compact blocks work; the sender sends you a compact block with compact tx hashes; you then request the transactions which you miss
<sipa> glozow: by keeping conflicted/orphan transactions around for a bit, they are available for reconstruction, so you won't have to ask for them if they show up in a compact block
<sipa> jnewbery: no, high bandwidth just drops a roundtrip (you don't need to ask for the compact block after announcement; they announce through a compact block directly)
<jnewbery> ok, so part two of that question: Why is it ok to stop guarding them with g_cs_orphans and guard them with m_mutex_message_handling instead?
<sipa> right, locking achieves two things: multithreaded r/w access (you're instantly UB if you don't protect variabled read/written by multiple threads with some form of locking/synchronization)
<sipa> but it also achieves consistency; by putting multiple variables under the same lock, they will always be in sync from the perspective of other threads
<jnewbery> sipa: exactly right. The class has certain invariants that must always be true for callers. We can lock a mutex, and then temporarily violate those invariants during processing, before restoring the invariants and releasing the lock.
<jnewbery> svav: ah, I was a little confused because you said they have mutex lock and unlock in TXOrphange. vExtraTxnForCompact and vExtraTxnForCompactIt are not in TXOrphange
<jnewbery> you're right that they're not protected under a different lock (m_mutex_message_handling), which prevents multiple threads accessing them concurrently
<jnewbery> ok, next question. This PR removes the Peer.m_orphan_work_set member and replaces it with a m_peer_work_set map in TxOrphanage. What is the peer orphan work set used for?
<jnewbery> so if we receive a transaction where we don't have its inputs in our UTXO set or mempool from peer A, and then peer B provides the missing parent and it gets accepted to our mempool, whose orphan work set will the orphan go into?
<jnewbery> no, we won't punish the peer that provided the parent. That'd be a good way to allow a malicious actor to get peers to disconnect other peers.
<jnewbery> ok, final question. (Bonus question) This PR originally included some behavioural changes in the way that orphans are handled. Those have now been moved from this PR to a separate branch and may be proposed as a follow-up PR. What are those behaviour changes?