#26036 introduced a new
mutex (NetEventsInterface::g_msgproc_mutex) to document the fact that our
message processing code is single threaded (msghand thread). Any
PeerManagerImpl or Peer members that are only ever accessed from that
single thread should be annotated as GUARDED_BY(g_msgproc_mutex), to avoid
bugs where those members are accessed by other threads as well (in which case
they need to be guarded by a different mutex).
CNodeState is documented to only have validation specific members and is
therefore entirely guarded by cs_main. However, not all members are
validation specific, and the ones that aren’t should be moved to Peer.
#26140 is a simple refactor
PR that moves some of the CNodeState members that are not validation
relevant to Peer and annotates them as guarded by g_msgproc_mutex.
Which threads access state relevant for message processing
(PeerManagerImpl, Peer, etc.)? (Hint: have a look at the developer
notes
for a list of all threads)
What is the difference between CNodeState and Peer? How would you decide
where to store new per-peer state? (Bonus points if you also mention CNode
in your answer)
The PR moves nUnconnectingHeaders, m_headers_sync_timeout,
fPreferHeaders and m_recently_announced_invs from CNodeState to
Peer. Multiple other members of CNodeState are also not validation
specific and should also move to Peer. Which members are that and why is
it not as trivial to move those in comparison to the ones that this PR
moves?
Why does the PR rename nUnconnectingHeaders and
MAX_UNCONNECTING_HEADERS?
<dergoegge> This week we are looking at #26140 "Move CNodeState members guarded by g_msgproc_mutex to Peer", notes are in the usual place: https://bitcoincore.reviews/26140
<svav> cs_main is a recursive mutex which is used to ensure that validation is carried out in an atomic way. It guards access to validation specific variables (such as CChainState and CNode) or mempool variables (in net_processing). The lock of cs_main is in validation.cpp.
<dergoegge> svav: I don't think that CNode is guarded by cs_main (and it totally shouldn't be). That SO answer is over a year old, so it might be out of date. Although i also can't remember CNode being guarded by cs_main.
<LarryRuane> if a thread acquires (locks) a mutex and then that same thread tries to lock it again, it will block... with a recursive mutex, the same thread can lock the mutex multiple times (simultaneously), and it's only unlocked on the last unlock
<dergoegge> stickies-v, amovfx: yes a recursive mutex can be locked multiple times but only by the same thread. A regular mutex will create a deadlock when locked twice from the same thread.
<stickies-v> dergoegge: the docs state that CNodeState's members are protected by cs_main, but I can't see that enforced in code anywhere. Do the docs mean that they _should_ only be accessed with a lock on cs_main?
<dergoegge> My prepared answer for this question: cs_main is our main validation mutex lock, used to guard any validation state from parallel access by different threads. It is also used to guard non-validation state which we should change (like this week's PR does)
<LarryRuane> my impression is that recursive mutexes, even though they seem useful. lead to lazy coding, and increase the possibility of synchronization bugs
<LarryRuane> dergoegge: very helpful, can you say a little on what validatation state means? what's an example of such state, and example of NOT such state?
<dergoegge> stickies-v: yes thats what the docs mean, we could add annotations (i.e. GUARDED_BY(cs_main)) to get the compiler to check (i think i did that at some point privately just to check that cs_main is always held when they're accessed)
<LarryRuane> maybe i can answer partially myself... our list of peers is part of our current state, but it's not *validation* state ... so changing that list should probably not require `cs_main`
<dergoegge> LarryRuane: validation state as in chainstate manager, chainstate but also non-validation state like CNodeState::nUnconnectingHeaders (which is strictly per peer p2p state)
<stickies-v> dergoegge: okay cool thanks, that's what I thought. The docs just make it sound like the locking is already taken care of, instead of highlighting that it absolutely needs to be done - which I think is a bit confusing
<dergoegge> amovfx: we use cs_main to keep the mempool in sync with the current chainstate iirc (e.g. utxo set) (eventually we could have an async mempool but thats a story for another time)
<dergoegge> OK we should move on: Which threads access state relevant for message processing (PeerManagerImpl, Peer, etc.)? (Hint: have a look at the developer notes for a list of all threads)
<dergoegge> amovfx: ThreadMessageHandler directly calls methods on the PeerManager (i.e. ProcessMessages, SendMessages) which do all the message handling
<dergoegge> PeerManager is registered as a CValidationInterface so it receives validation callbacks which are scheduled on the scheduler thread (e.g. BlockConnected, BlockDisconnected, NewPowValidBlock)
<dergoegge> I added this question to make it obvious that some functions on PeerManager are called by multiple threads and any state accessed/mutated in them needs protection against parallel access.
<dergoegge> Next question: What is the difference between CNodeState and Peer? How would you decide where to store new per-peer state? (Bonus points if you also mention CNode in your answer)
<stickies-v> oh. good point, I only checked the direct calls, not the entire stacks. that seems like a very difficult thing to verify if you're not reasonably familiar with all the functions called, or is there a trick?
<LarryRuane> dergoegge: I think currently, some of CNodeState is not relevant to validation, and so better belongs in Peer.. Peer has non-validation stuff, CNodeState should have validation stuff -- did I get that right? :)
<dergoegge> LarryRuane: yes CNodeState is meant for validation specific per-peer state guarded by cs_main and Peer is meant for all other per-peer state (that is not guarded by cs_main)
<stickies-v> intuitively, I'd say CNode mostly deals with communication between nodes, CNodeState with validating what we get from a node, and Peer to deal with policy and checking if peers are behaving correctly. Is that a fair approximation?
<stickies-v> pablomartin dergoegge yeah but call graphs are function/class based, so you'd have to look at a lot of charts to figure out what calls into net_processing?
<stickies-v> cool. No I mean p2p policy, e.g. not exceeding the number of unconnected headers etc. I don't think this touches mempool policy at all, right?
<pablomartin> dergoegge: about... where to store new peer-peer state? CNode (defined in net.h, used by m_nodes(CConnman) and covered by m_nodes_mutex) is concerned with the connection state of the peer.
<LarryRuane> but seriously, is the rename because it's more accurate to include "msgs" in the name, since it's a count of headers *messages*? (i'm not sure)
<dergoegge> LarryRuane: yes but the renaming is useful because the variable tracks the number of "headers" messages that did not connect, not the number of headers that didn't connect