Aug 19, 2020
The PR branch HEAD was 4fc2e3906 at the time of this review club meeting.
PR 19509 is a feature PR,
adding per-peer message logging to Bitcoin Core.
Nodes on the Bitcoin network communicate using a variety of
This PR makes it possible to log these messages, in chronological order and
on a per-peer basis.
The purpose of this feature is twofold: it should assist people in debugging
their node, and it should help people who are unfamiliar with the Bitcoin
network to get a feel of how everything works.
This feature is locked behind a debug-only
-logmessages option. When
enabled, it writes every message the node receives to a file, in a folder
structure described in the PR. It uses our
The log parser lives in a new folder, at
contrib/message-logging/message-logging-parser.py. As discussed
takes a list of files to parse, and optionally an output argument.
LogMessage takes the newly introduced
Span, as discussed in a
PR review club.
commit accounts for the increase in file descriptor usage. This is due to be
improved in Add
LogMessage Issue 18911. Questions
Did you review the PR?
Concept ACK, approach ACK, tested ACK, or
(You’re always encouraged to put your PR review on GitHub, even after the PR
has been merged.)
How did you test this PR? Was setting up and running everything
straightforward enough? As a reminder, the test runner has a
Did you verify that file descriptor accounting was being done correctly in
commit? What is the soft file descriptor limit on your system? What about the
hard limit? Add
commit is a cleanup. Is this justified or is this just noise? When should
cleanups like this be done? Clean PushMessage and
commit adds a test. What is being tested? What isn’t being tested? Should this
be expanded? Did you read the out-of-tree build discussion
Add Message Logging
proposed extensions to this message logging. What are they? Do you agree that
these are a good idea? Are there any extensions that you would like to see?
Bonus: For those of you really interested in python, here’s a great
opportunity to explore the oddities of pathlib! How does
Path(logpath) always return an absolute path, even when logpath itself is
already an absolute path?
1 17:00 <troygiorshev> #startmeeting
2 17:00 <troygiorshev> hellooo
7 17:00 <gzhao408> heyooooo
9 17:00 <troygiorshev> welcome to review club everyone! today we'll be talking about a feature pr, adding per-peer message logging to core!
10 17:00 <michaelfolkson> hi
14 17:01 <troygiorshev> who's had a chance to review the pr? (y/n)
17 17:01 <michaelfolkson> y
22 17:02 <troygiorshev> awesome!
23 17:02 <troygiorshev> would anyone like to summarize 19509?
24 17:03 <troygiorshev> (for anyone lurking :D )
25 17:04 <michaelfolkson> Anyone new want to try to summarize? I thought your descriptions were very good troygiorshev, commit messages too
26 17:04 <ecurrencyhodler> It's a way to log p2p messages between nodes in Bitcoin Core.
27 17:05 <troygiorshev> michaelfolkson: thanks!
28 17:05 <troygiorshev> ecurrencyhodler: yup, that's it!
29 17:05 <commodore> as well as code to produce json outputs for the logs
30 17:05 <lightlike> Adding an option to dump all p2p messages our node sends/receives into a binary file, with some tools to parse this data.
31 17:05 <troygiorshev> commodore lightlike: great points as well
33 17:06 <troygiorshev> it's not a ton more than what it says on the tin! important for feature PRs to be simple and well defined
34 17:06 <troygiorshev> ok, first question, how did you test this PR? Was setting up and running everything straightforward enough?
35 17:07 <ecurrencyhodler> I thought Bitcoin Core already kept track of p2p message logs through -printtoconsole. Is the difference that this would just be recorded to a file?
36 17:08 <amiti> my experience was smooth! I recompiled and ran my node locally with `logmessages` enabled, saw the new files in the data dir & fed them into the parser to interpret them. it was super cool!
37 17:09 <amiti> some other things I did to test involved adding logging to the p2p_message_logging test to see what messages are being recorded
38 17:09 <troygiorshev> ecurrencyhodler: good catch on that config option. It's not quite the same though. printtoconsole is used to print the _debug log_ to console (as opposed to file). This PR is adding much more. Not just debugging, it logs all of the p2p messages that your node sends back and forth, with their details in a mostly human readable form. this isn't currently being done anywhere
39 17:09 <jnewbery> ecurrencyhodler: -printtoconsole prints the debug log to the console, not the p2p messages
40 17:10 <ecurrencyhodler> very cool ty
41 17:11 <jnewbery> another thing you could do to test would be to add -logmessages to the config for a functional test, then run with --nocleanup, and check the logging files that are left in the test directory
42 17:12 <troygiorshev> amiti: good point bringing up p2p_message_logging.py, it's becoming a very multipurpose test!
44 17:13 <lightlike> though if "-net" debugging is enabled, most p2p messages get an entry in the log - so I would think that this is mostly useful for testing the low-level "net" stuff (as opposed to the net_processing higher-level logic)?
45 17:14 <jnewbery> lightlike: exactly. You should usually be able to trace control flow using the debug logs. This new message logging would help if you need to know the exact messages being sent/received.
46 17:15 <troygiorshev> lightlike: especially for beginners, or for people debugging their node, they might want to see *exactly* what's going on
47 17:16 <troygiorshev> next question: Did you verify that file descriptor accounting was being done correctly in the "Add LogMessage" commit? What is the soft file descriptor limit on your system? What about the hard limit?
48 17:16 <troygiorshev> if you didn't find this, try `ulimit -a` and `ulimit -aH` :)
49 17:19 <lightlike> for me, it's 4096 "open files". Is this considerably lower on some systems?
50 17:19 <gzhao408> Verified to the best of my ability :P my limits say 256 and "unlimited"
51 17:19 <gzhao408> I have a question about why NUM_FDS_MESSAGE_LOGGING is just 1? Do we only allocate one file descriptor for all peer message logging, and we open/close every time we log so we can reuse it every time? (I could be misunderstanding how the file descriptors are used here, I’m assuming we need 1 per open file).
52 17:20 <troygiorshev> gzhao408: that's a great question!
53 17:20 <jnewbery> mine are 1024 and ~1 million
54 17:21 <troygiorshev> gzhao408: you have it exactly right, because we open and close every time we log, we can reuse our single file descriptor each time!
55 17:22 <troygiorshev> on my system I get the following:
56 17:22 <troygiorshev> ulimit -n: 1024
57 17:22 <troygiorshev> ulimit -nH: 4096
58 17:22 <troygiorshev> ulimit -x: unlimited
59 17:23 <troygiorshev> The first two are especially important for a single process. If a process tries to take more than its soft limit (-n) of file descriptors, the OS will kill it!
61 17:24 <troygiorshev> Any process is allowed to increse their soft limit (-n) up to but not past their hard limit (-nH)
62 17:24 <troygiorshev> and -x is the total number available to all processes combined
63 17:25 <troygiorshev> I can guess that gzhao408 is probably on a Mac, because they're the only modern system that has such a low limit!
64 17:25 <troygiorshev> lightlike: so, yeah, it is :)
65 17:25 <gzhao408> Does using more file descriptors help with performance? e.g. I'm guessing that writes are buffered, and you flush on close (depends on implementation) so I was curious if you had considered using 1 per peer or something? Or how you came to the conclusion of just 1?
66 17:25 <commodore> soft: 256, hard unlimited
67 17:26 <gzhao408> troygiorshev: u caught me :cold_sweat
68 17:26 <commodore> ulimit -S -n; ulimit -H -n #macos
69 17:26 <lightlike> did I read the code correctly such that this means that we need one file descriptor per peer connection, so we will be able to connect one peer less after this?
70 17:26 <sipa> gzhao408: if you have fewer file descriptors available, bitcoin core will just refuse to create more connections
71 17:26 <sipa> it doesn't change performance
72 17:26 <gzhao408> sipa: so the fewer the better?
74 17:26 <troygiorshev> lightlike: see what sipa just said, yup
75 17:27 <troygiorshev> lightlike: IF we don't have enough available
76 17:27 <sipa> gzhao408: if you want many connections you need more file descriptors
77 17:27 <sipa> if you don't, you don't care
78 17:28 <troygiorshev> gzhao408: so having one (or maybe two) file descriptors per peer could more than triple our usage! It's possible this would bring it up to the hard limit on some systems and restrict the number of peers we could connect to
79 17:28 <gzhao408> Sorry I meant, I'm gathering that choosing to use 1 file descriptor is to minimize the file descriptor requirements
80 17:28 <felixweis> if you have low resources, you might want to reduce file descriptors if you run a pi webserver.
81 17:28 <troygiorshev> if anyone has found it fun diving into this file descriptor stuff, there's an issue open right now to clean it up! #18911
82 17:28 <sipa> gzhao408: right
83 17:29 <gzhao408> thanks sipa!
84 17:29 <lightlike> it seems that nMaxConnections is reduced even if we don't enable the new logging option - though we probably don't care about this?
85 17:29 <troygiorshev> anyone interested keep going on that, I'll throw the next question down
86 17:29 <troygiorshev> The "Add Message Logging Test" commit adds a test. What is being tested? What isn't being tested? Should this be expanded? Did you read the out-of-tree build discussion?
87 17:31 <michaelfolkson> So only the structure of the message, not deserializing or order of messages or message type
88 17:32 <troygiorshev> michaelfolkson: yup, did you see why it was done that way?
89 17:32 <amiti> I think it makes sense for it to be more of a validity check & not worry about the specific messages etc.
90 17:32 <jnewbery> lighlike: generally, we're not at the limit of permitted file descriptors
91 17:32 <michaelfolkson> By too brittle you mean not the best place to test it? Test will be too unreliable?
92 17:34 <troygiorshev> michaelfolkson: pretty much yup! if, say, the order of messages in our handshake changes, or the format of one changes even a little bit then the test would break! that's too much of a pain for everyone
93 17:34 <felixweis> is ":" not an issue on windows filesystem?
94 17:34 <felixweis> for the directory name ip:port
95 17:36 <troygiorshev> felixweis: it almost certainly is, good catch!
96 17:37 <troygiorshev> looks like nobody has tested it on windows yet, and while I expect most people running the debug options will be on linux or mac, this definitely needs to be checked and fixed
97 17:37 <ecurrencyhodler> Sorry for the tangent. But what is the value of a PR like this? Is it purely educational? Or is there a more functional purpose such as potentially identifying a bad acting node?
98 17:38 <sipa> or debugging
99 17:39 <troygiorshev> ecurrencyhodler: not a tangent at all. feature PRs have to be treated a little differently than refactor or bugfix prs.
100 17:39 <troygiorshev> ecurrencyhodler: so that sort of question is a really useful one
101 17:40 <ecurrencyhodler> okay whew haha
102 17:40 <troygiorshev> i agree with both you and sipa, educational and debugging. though I wouldn't be surprised if someone uses it for something else some day!
103 17:41 <jnewbery> I'm interested in being able to see exactly what traffic I'm receiving from peers on my node
104 17:41 <jnewbery> particularly if I'm able to filter by version/subversion/etc
105 17:41 <felixweis> yeah i've tried to decypher wireshark in the past
106 17:42 <troygiorshev> felixweis: how did it go?
107 17:42 <michaelfolkson> Do a lot of contributors use Wireshark currently? I have never used it on Core. One of the thousand things I'd like to have done but haven't yet
108 17:42 <felixweis> too much hasstle
109 17:42 <troygiorshev> michaelfolkson: do you think you would use this more?
110 17:42 <jnewbery> I could do that using wireshark outside the software, but having a toggle in the application itself that just dumps application level data for the peers I'm interested in seems much more useful
111 17:42 <ecurrencyhodler> I wonder if you could you also do some data analysis on the information that's dumped? Maybe clue people in to optimizing p2p network more?
112 17:42 <felixweis> i've played around with this now. this gives me together with message-logging-parser.py is the MVP
113 17:43 <jnewbery> felixweis: yes, wireshark is very powerful and customizable but is definitely a hassle to get what you want
114 17:43 <troygiorshev> ecurrencyhodler: that's a great idea! I await your results :D
115 17:43 <felixweis> since the messages are POST p2p encoding. e.g. application messages
116 17:43 <ecurrencyhodler> hehe will do troy.
117 17:44 <michaelfolkson> troygiorshev: Yeah certainly more accessible
118 17:45 <troygiorshev> next question
119 17:45 <troygiorshev> The "Clean PushMessage and ProcessMessages" commit is a cleanup. Is this justified or is this just noise? When should cleanups like this be done?
120 17:47 <michaelfolkson> This was just style guide type stuff right?
121 17:48 <amiti> I am +1 for cleanup commits that are separate and its made clear that its just a cleanup
122 17:48 <troygiorshev> michaelfolkson: yup!
123 17:48 <troygiorshev> amiti: glad to hear it
124 17:49 <michaelfolkson> Yeah +1 don't see why not. I suppose only question is whether it should be its own PR? But I'd lean towards no
125 17:49 <jnewbery> amiti: I agree. If you're touching most of a function and you're going to end up changing it to use the new code style, you might as well do it in the first commit and get it out of the way
126 17:49 <amiti> also I've spent way too long debugging how the code broke when I just added logging, turned out it was one of those one-line-if-statements without brackets. so small stuff like this can matter.
127 17:50 <sipa> ifs with missing braces is one of the (few) actual known examples of coding style that has caused real world bugs
128 17:50 <troygiorshev> sipa: ah cool!
129 17:50 <sipa> in openssl, of course
130 17:50 <troygiorshev> 10 minutes left, last question!
131 17:51 <troygiorshev> Both jnewbery and practicalswift proposed extensions to this message logging. What are they? Do you agree that these are a good idea? Are there any extensions that you would like to see?
132 17:51 <troygiorshev> there are definitly no wrong answers to this one!
133 17:52 <jnewbery> sipa: if it's the one I'm thinking of, I think it was apple's ssl implementation that had the if braces bug
136 17:52 <michaelfolkson> Right so jnewbery only wants to log messages for a specific peer. To save space, make it cleaner I'm assuming
137 17:52 <sipa> jnewbery: you're right!
138 17:53 <ajonas> I love the idea of the building a fuzzing corpus
139 17:54 <jnewbery> ajonas: I agree. It's a great idea
140 17:54 <jnewbery> michaelfolkson: yes, if I want to debug interactions with an individual peer, I don't want to dump potentially 100s of MBs of traffic from other peers
141 17:55 <jnewbery> especially if I want to leave this running for long periods
142 17:55 <michaelfolkson> Makes sense. Both good suggestions
143 17:55 <troygiorshev> anyone tried enabling this over an entire ibd? how large were the files?
144 17:56 <felixweis> this works during IBD? could be interesting to visualize peer round-robin
145 17:56 <jnewbery> it'd be >300GB since you'd be dumping the entire blockchain in net serialized format
146 17:56 <sipa> one use case i have is that when investigating the effect of some P2P change, i want to look at the exact transcript... usually not at the level of seeing all the messages, but you do want to see it per peer; using debug.log is really annoying as peer ids get reused across restarts, for example
147 17:57 <sipa> so having a file per peer solves that pretty elegantly
148 17:57 <ajonas> I did IBD on testnet with it on
149 17:58 <ajonas> the files were big, the python script just seemed to hang
150 17:58 <sipa> if i can ask: what is the rationale for splitting received/sent messages up into two files?
151 17:58 <felixweis> maybe a flag to limit to certain message types
152 17:58 <sipa> (maybe this was addressed in the PR, just tell me so if i'm missing context)
154 17:59 <troygiorshev> sipa: 1: saves a byte per message, 2: hypothetically someone might want to just see received or sent
155 17:59 <troygiorshev> one minute left!
156 18:00 <jnewbery> it means we don't need to a field to indicate whether the message was outgoing or incoming. The parser can take any number of files as input, so you can use both together if you want the two-way transcript
157 18:00 <troygiorshev> ok that's time! didn't get to the bonus question but anyone interested in python can find it themselves :)
158 18:00 <felixweis> hm maybe interweaving send/receive isn't that bad because ofthen things are request/respongs (inv/getdata, ...)
159 18:01 <sipa> troygiorshev: is there a way to interleave them again?
160 18:01 <sipa> having to look at them separately would make it very painful to use
161 18:01 <troygiorshev> sipa: yup! just giving both files to the parser interleaves them
162 18:01 <troygiorshev> it's the intended use :)
164 18:01 <lightlike> can we reconstruct the time order in which messages were processed in our node, even with the splitting in two files?
165 18:01 <troygiorshev> thanks to everyone for coming and thanks for all of the great discussion!
166 18:02 <jnewbery> lightlike: yes, each message has a timestamp of sent/received time
167 18:02 <felixweis> thans troygiorshev
168 18:02 <lightlike> thanks
169 18:02 <jnewbery> so feeding them both into the parser will interleave them in order
170 18:02 <michaelfolkson> Thanks troygiorshev!
171 18:03 <jnewbery> the idea is that the parser is just one potential tool to analyze these files. If you want to do more quantitive analysis of the traffic, you could build a tool that parses/analyzes them in a different way.
172 18:03 <troygiorshev> #endmeeting