The PR branch HEAD was 4fc2e3906 at the time of this review club meeting.
Notes
PR 19509 is a feature PR,
adding per-peer message logging to Bitcoin Core.
Nodes on the Bitcoin network communicate using a variety of
messages.
This PR makes it possible to log these messages, in chronological order and
on a per-peer basis.
The purpose of this feature is twofold: it should assist people in debugging
their node, and it should help people who are unfamiliar with the Bitcoin
network to get a feel of how everything works.
This feature is locked behind a debug-only -logmessages option. When
enabled, it writes every message the node receives to a file, in a folder
structure described in the PR. It uses our CAutoFilehere.
The log parser lives in a new folder, at
contrib/message-logging/message-logging-parser.py. As discussed
here it
takes a list of files to parse, and optionally an output argument.
How did you test this PR? Was setting up and running everything
straightforward enough? As a reminder, the test runner has a --nocleanup
option.
Did you verify that file descriptor accounting was being done correctly in
the Add
LogMessage
commit? What is the soft file descriptor limit on your system? What about the
hard limit?
The Add Message Logging
Test
commit adds a test. What is being tested? What isn’t being tested? Should this
be expanded? Did you read the out-of-tree build discussion
here?
Both jnewbery
(comment)
and practicalswift
(comment)
proposed extensions to this message logging. What are they? Do you agree that
these are a good idea? Are there any extensions that you would like to see?
Bonus: For those of you really interested in python, here’s a great
opportunity to explore the oddities of pathlib! How does Path.cwd() /
Path(logpath) always return an absolute path, even when logpath itself is
already an absolute path?
<ecurrencyhodler> I thought Bitcoin Core already kept track of p2p message logs through -printtoconsole. Is the difference that this would just be recorded to a file?
<amiti> my experience was smooth! I recompiled and ran my node locally with `logmessages` enabled, saw the new files in the data dir & fed them into the parser to interpret them. it was super cool!
<troygiorshev> ecurrencyhodler: good catch on that config option. It's not quite the same though. printtoconsole is used to print the _debug log_ to console (as opposed to file). This PR is adding much more. Not just debugging, it logs all of the p2p messages that your node sends back and forth, with their details in a mostly human readable form. this isn't currently being done anywhere
<jnewbery> another thing you could do to test would be to add -logmessages to the config for a functional test, then run with --nocleanup, and check the logging files that are left in the test directory
<lightlike> though if "-net" debugging is enabled, most p2p messages get an entry in the log - so I would think that this is mostly useful for testing the low-level "net" stuff (as opposed to the net_processing higher-level logic)?
<jnewbery> lightlike: exactly. You should usually be able to trace control flow using the debug logs. This new message logging would help if you need to know the exact messages being sent/received.
<troygiorshev> next question: Did you verify that file descriptor accounting was being done correctly in the "Add LogMessage" commit? What is the soft file descriptor limit on your system? What about the hard limit?
<gzhao408> I have a question about why NUM_FDS_MESSAGE_LOGGING is just 1? Do we only allocate one file descriptor for all peer message logging, and we open/close every time we log so we can reuse it every time? (I could be misunderstanding how the file descriptors are used here, I’m assuming we need 1 per open file).
<troygiorshev> The first two are especially important for a single process. If a process tries to take more than its soft limit (-n) of file descriptors, the OS will kill it!
<gzhao408> Does using more file descriptors help with performance? e.g. I'm guessing that writes are buffered, and you flush on close (depends on implementation) so I was curious if you had considered using 1 per peer or something? Or how you came to the conclusion of just 1?
<lightlike> did I read the code correctly such that this means that we need one file descriptor per peer connection, so we will be able to connect one peer less after this?
<troygiorshev> gzhao408: so having one (or maybe two) file descriptors per peer could more than triple our usage! It's possible this would bring it up to the hard limit on some systems and restrict the number of peers we could connect to
<troygiorshev> The "Add Message Logging Test" commit adds a test. What is being tested? What isn't being tested? Should this be expanded? Did you read the out-of-tree build discussion?
<troygiorshev> michaelfolkson: pretty much yup! if, say, the order of messages in our handshake changes, or the format of one changes even a little bit then the test would break! that's too much of a pain for everyone
<troygiorshev> looks like nobody has tested it on windows yet, and while I expect most people running the debug options will be on linux or mac, this definitely needs to be checked and fixed
<ecurrencyhodler> Sorry for the tangent. But what is the value of a PR like this? Is it purely educational? Or is there a more functional purpose such as potentially identifying a bad acting node?
<troygiorshev> i agree with both you and sipa, educational and debugging. though I wouldn't be surprised if someone uses it for something else some day!
<michaelfolkson> Do a lot of contributors use Wireshark currently? I have never used it on Core. One of the thousand things I'd like to have done but haven't yet
<jnewbery> I could do that using wireshark outside the software, but having a toggle in the application itself that just dumps application level data for the peers I'm interested in seems much more useful
<ecurrencyhodler> I wonder if you could you also do some data analysis on the information that's dumped? Maybe clue people in to optimizing p2p network more?
<troygiorshev> The "Clean PushMessage and ProcessMessages" commit is a cleanup. Is this justified or is this just noise? When should cleanups like this be done?
<jnewbery> amiti: I agree. If you're touching most of a function and you're going to end up changing it to use the new code style, you might as well do it in the first commit and get it out of the way
<amiti> also I've spent way too long debugging how the code broke when I just added logging, turned out it was one of those one-line-if-statements without brackets. so small stuff like this can matter.
<troygiorshev> Both jnewbery and practicalswift proposed extensions to this message logging. What are they? Do you agree that these are a good idea? Are there any extensions that you would like to see?
<jnewbery> michaelfolkson: yes, if I want to debug interactions with an individual peer, I don't want to dump potentially 100s of MBs of traffic from other peers
<sipa> one use case i have is that when investigating the effect of some P2P change, i want to look at the exact transcript... usually not at the level of seeing all the messages, but you do want to see it per peer; using debug.log is really annoying as peer ids get reused across restarts, for example
<jnewbery> it means we don't need to a field to indicate whether the message was outgoing or incoming. The parser can take any number of files as input, so you can use both together if you want the two-way transcript
<jnewbery> the idea is that the parser is just one potential tool to analyze these files. If you want to do more quantitive analysis of the traffic, you could build a tool that parses/analyzes them in a different way.