libs)

Aug 19, 2020

https://github.com/bitcoin/bitcoin/pull/19509

Host: troygiorshev - PR author: troygiorshev

The PR branch HEAD was 4fc2e3906 at the time of this review club meeting.

Notes

PR 19509 is a feature PR, adding per-peer message logging to Bitcoin Core.
Nodes on the Bitcoin network communicate using a variety of messages. This PR makes it possible to log these messages, in chronological order and on a per-peer basis.
The purpose of this feature is twofold: it should assist people in debugging their node, and it should help people who are unfamiliar with the Bitcoin network to get a feel of how everything works.
This feature is locked behind a debug-only -logmessages option. When enabled, it writes every message the node receives to a file, in a folder structure described in the PR. It uses our CAutoFile here.
The log parser lives in a new folder, at contrib/message-logging/message-logging-parser.py. As discussed here it takes a list of files to parse, and optionally an output argument.
LogMessage takes the newly introduced Span, as discussed in a previous PR review club.
The Add LogMessage commit accounts for the increase in file descriptor usage. This is due to be improved in Issue 18911.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (You’re always encouraged to put your PR review on GitHub, even after the PR has been merged.)
How did you test this PR? Was setting up and running everything straightforward enough? As a reminder, the test runner has a --nocleanup option.
Did you verify that file descriptor accounting was being done correctly in the Add LogMessage commit? What is the soft file descriptor limit on your system? What about the hard limit?
The Clean PushMessage and ProcessMessages commit is a cleanup. Is this justified or is this just noise? When should cleanups like this be done?
The Add Message Logging Test commit adds a test. What is being tested? What isn’t being tested? Should this be expanded? Did you read the out-of-tree build discussion here?
Both jnewbery (comment) and practicalswift (comment) proposed extensions to this message logging. What are they? Do you agree that these are a good idea? Are there any extensions that you would like to see?
Bonus: For those of you really interested in python, here’s a great opportunity to explore the oddities of pathlib! How does Path.cwd() / Path(logpath) always return an absolute path, even when logpath itself is already an absolute path?

Meeting Log

17:00

<troygiorshev> #startmeeting

17:00

<troygiorshev> hellooo

17:00

<commodore> hi hi

17:00

<nehan> hi

17:00

<jkczyz> hi

17:00

<jnewbery> hi

17:00

<gzhao408> heyooooo

17:00

<lightlike> hi

17:00

<troygiorshev> welcome to review club everyone! today we'll be talking about a feature pr, adding per-peer message logging to core!

17:00

<michaelfolkson> hi

17:01

<troygiorshev> notes and questions in the usual spot: https://bitcoincore.reviews/19509.html

17:01

<ajonas> hi

17:01

<amiti> hi

17:01

<troygiorshev> who's had a chance to review the pr? (y/n)

17:01

<amiti> y

17:01

<felixweis> Hi

17:01

<michaelfolkson> y

17:01

<jkczyz> n

17:02

<jnewbery> y

17:02

<lightlike> y

17:02

<gzhao408> y

17:02

<troygiorshev> awesome!

17:02

<troygiorshev> would anyone like to summarize 19509?

17:03

<troygiorshev> (for anyone lurking :D )

17:04

<michaelfolkson> Anyone new want to try to summarize? I thought your descriptions were very good troygiorshev, commit messages too

17:04

<ecurrencyhodler> It's a way to log p2p messages between nodes in Bitcoin Core.

17:05

<troygiorshev> michaelfolkson: thanks!

17:05

<troygiorshev> ecurrencyhodler: yup, that's it!

17:05

<commodore> as well as code to produce json outputs for the logs

17:05

<lightlike> Adding an option to dump all p2p messages our node sends/receives into a binary file, with some tools to parse this data.

17:05

<troygiorshev> commodore lightlike: great points as well

17:06

<ariard> hi

17:06

<troygiorshev> it's not a ton more than what it says on the tin! important for feature PRs to be simple and well defined

17:06

<troygiorshev> ok, first question, how did you test this PR? Was setting up and running everything straightforward enough?

17:07

<ecurrencyhodler> I thought Bitcoin Core already kept track of p2p message logs through -printtoconsole. Is the difference that this would just be recorded to a file?

17:08

<amiti> my experience was smooth! I recompiled and ran my node locally with `logmessages` enabled, saw the new files in the data dir & fed them into the parser to interpret them. it was super cool!

17:09

<amiti> some other things I did to test involved adding logging to the p2p_message_logging test to see what messages are being recorded

17:09

<troygiorshev> ecurrencyhodler: good catch on that config option. It's not quite the same though. printtoconsole is used to print the _debug log_ to console (as opposed to file). This PR is adding much more. Not just debugging, it logs all of the p2p messages that your node sends back and forth, with their details in a mostly human readable form. this isn't currently being done anywhere

17:09

<jnewbery> ecurrencyhodler: -printtoconsole prints the debug log to the console, not the p2p messages

17:10

<ecurrencyhodler> very cool ty

17:11

<jnewbery> another thing you could do to test would be to add -logmessages to the config for a functional test, then run with --nocleanup, and check the logging files that are left in the test directory

17:12

<troygiorshev> amiti: good point bringing up p2p_message_logging.py, it's becoming a very multipurpose test!

17:12

<jnewbery> I think you could just add "-logmessages" to the list here: https://github.com/bitcoin/bitcoin/blob/c6532fa6c14d3cea5941cff9411937db9a8d5f1a/test/functional/test_framework/test_node.py#L95-L102

17:13

<lightlike> though if "-net" debugging is enabled, most p2p messages get an entry in the log - so I would think that this is mostly useful for testing the low-level "net" stuff (as opposed to the net_processing higher-level logic)?

17:14

<jnewbery> lightlike: exactly. You should usually be able to trace control flow using the debug logs. This new message logging would help if you need to know the exact messages being sent/received.

17:15

<troygiorshev> lightlike: especially for beginners, or for people debugging their node, they might want to see *exactly* what's going on

17:16

<troygiorshev> next question: Did you verify that file descriptor accounting was being done correctly in the "Add LogMessage" commit? What is the soft file descriptor limit on your system? What about the hard limit?

17:16

<troygiorshev> if you didn't find this, try `ulimit -a` and `ulimit -aH` :)

17:19

<lightlike> for me, it's 4096 "open files". Is this considerably lower on some systems?

17:19

<gzhao408> Verified to the best of my ability :P my limits say 256 and "unlimited"

17:19

<gzhao408> I have a question about why NUM_FDS_MESSAGE_LOGGING is just 1? Do we only allocate one file descriptor for all peer message logging, and we open/close every time we log so we can reuse it every time? (I could be misunderstanding how the file descriptors are used here, I’m assuming we need 1 per open file).

17:20

<troygiorshev> gzhao408: that's a great question!

17:20

<jnewbery> mine are 1024 and ~1 million

17:21

<troygiorshev> gzhao408: you have it exactly right, because we open and close every time we log, we can reuse our single file descriptor each time!

17:22

<troygiorshev> on my system I get the following:

17:22

<troygiorshev> ulimit -n: 1024

17:22

<troygiorshev> ulimit -nH: 4096

17:22

<troygiorshev> ulimit -x: unlimited

17:23

<troygiorshev> The first two are especially important for a single process. If a process tries to take more than its soft limit (-n) of file descriptors, the OS will kill it!

17:23

<jonatack> hi

17:24

<troygiorshev> Any process is allowed to increse their soft limit (-n) up to but not past their hard limit (-nH)

17:24

<troygiorshev> and -x is the total number available to all processes combined

17:25

<troygiorshev> I can guess that gzhao408 is probably on a Mac, because they're the only modern system that has such a low limit!

17:25

<troygiorshev> lightlike: so, yeah, it is :)

17:25

<gzhao408> Does using more file descriptors help with performance? e.g. I'm guessing that writes are buffered, and you flush on close (depends on implementation) so I was curious if you had considered using 1 per peer or something? Or how you came to the conclusion of just 1?

17:25

<commodore> soft: 256, hard unlimited

17:26

<gzhao408> troygiorshev: u caught me :cold_sweat

17:26

<commodore> ulimit -S -n; ulimit -H -n #macos

17:26

<lightlike> did I read the code correctly such that this means that we need one file descriptor per peer connection, so we will be able to connect one peer less after this?

17:26

<sipa> gzhao408: if you have fewer file descriptors available, bitcoin core will just refuse to create more connections

17:26

<sipa> it doesn't change performance

17:26

<gzhao408> sipa: so the fewer the better?

17:26

<sipa> what?

17:26

<troygiorshev> lightlike: see what sipa just said, yup

17:27

<troygiorshev> lightlike: IF we don't have enough available

17:27

<sipa> gzhao408: if you want many connections you need more file descriptors

17:27

<sipa> if you don't, you don't care

17:28

<troygiorshev> gzhao408: so having one (or maybe two) file descriptors per peer could more than triple our usage! It's possible this would bring it up to the hard limit on some systems and restrict the number of peers we could connect to

17:28

<gzhao408> Sorry I meant, I'm gathering that choosing to use 1 file descriptor is to minimize the file descriptor requirements

17:28

<felixweis> if you have low resources, you might want to reduce file descriptors if you run a pi webserver.

17:28

<troygiorshev> if anyone has found it fun diving into this file descriptor stuff, there's an issue open right now to clean it up! #18911

17:28

<sipa> gzhao408: right

17:29

<gzhao408> thanks sipa!

17:29

<lightlike> it seems that nMaxConnections is reduced even if we don't enable the new logging option - though we probably don't care about this?

17:29

<troygiorshev> anyone interested keep going on that, I'll throw the next question down

17:29

<troygiorshev> The "Add Message Logging Test" commit adds a test. What is being tested? What isn't being tested? Should this be expanded? Did you read the out-of-tree build discussion?

17:31

<michaelfolkson> So only the structure of the message, not deserializing or order of messages or message type

17:32

<troygiorshev> michaelfolkson: yup, did you see why it was done that way?

17:32

<amiti> I think it makes sense for it to be more of a validity check & not worry about the specific messages etc.

17:32

<jnewbery> lighlike: generally, we're not at the limit of permitted file descriptors

17:32

<michaelfolkson> By too brittle you mean not the best place to test it? Test will be too unreliable?

17:34

<troygiorshev> michaelfolkson: pretty much yup! if, say, the order of messages in our handshake changes, or the format of one changes even a little bit then the test would break! that's too much of a pain for everyone

17:34

<felixweis> is ":" not an issue on windows filesystem?

17:34

<felixweis> for the directory name ip:port

17:36

<troygiorshev> felixweis: it almost certainly is, good catch!

17:37

<troygiorshev> looks like nobody has tested it on windows yet, and while I expect most people running the debug options will be on linux or mac, this definitely needs to be checked and fixed

17:37

<ecurrencyhodler> Sorry for the tangent. But what is the value of a PR like this? Is it purely educational? Or is there a more functional purpose such as potentially identifying a bad acting node?

17:38

<sipa> or debugging

17:39

<troygiorshev> ecurrencyhodler: not a tangent at all. feature PRs have to be treated a little differently than refactor or bugfix prs.

100

17:39

<troygiorshev> ecurrencyhodler: so that sort of question is a really useful one

101

17:40

<ecurrencyhodler> okay whew haha

102

17:40

<troygiorshev> i agree with both you and sipa, educational and debugging. though I wouldn't be surprised if someone uses it for something else some day!

103

17:41

<jnewbery> I'm interested in being able to see exactly what traffic I'm receiving from peers on my node

104

17:41

<jnewbery> particularly if I'm able to filter by version/subversion/etc

105

17:41

<felixweis> yeah i've tried to decypher wireshark in the past

106

17:42

<troygiorshev> felixweis: how did it go?

107

17:42

<michaelfolkson> Do a lot of contributors use Wireshark currently? I have never used it on Core. One of the thousand things I'd like to have done but haven't yet

108

17:42

<felixweis> too much hasstle

109

17:42

<troygiorshev> michaelfolkson: do you think you would use this more?

110

17:42

<jnewbery> I could do that using wireshark outside the software, but having a toggle in the application itself that just dumps application level data for the peers I'm interested in seems much more useful

111

17:42

<ecurrencyhodler> I wonder if you could you also do some data analysis on the information that's dumped? Maybe clue people in to optimizing p2p network more?

112

17:42

<felixweis> i've played around with this now. this gives me together with message-logging-parser.py is the MVP

113

17:43

<jnewbery> felixweis: yes, wireshark is very powerful and customizable but is definitely a hassle to get what you want

114

17:43

<troygiorshev> ecurrencyhodler: that's a great idea! I await your results :D

115

17:43

<felixweis> since the messages are POST p2p encoding. e.g. application messages

116

17:43

<ecurrencyhodler> hehe will do troy.

117

17:44

<michaelfolkson> troygiorshev: Yeah certainly more accessible

118

17:45

<troygiorshev> next question

119

17:45

<troygiorshev> The "Clean PushMessage and ProcessMessages" commit is a cleanup. Is this justified or is this just noise? When should cleanups like this be done?

120

17:47

<michaelfolkson> This was just style guide type stuff right?

121

17:48

<amiti> I am +1 for cleanup commits that are separate and its made clear that its just a cleanup

122

17:48

<troygiorshev> michaelfolkson: yup!

123

17:48

<troygiorshev> amiti: glad to hear it

124

17:49

<michaelfolkson> Yeah +1 don't see why not. I suppose only question is whether it should be its own PR? But I'd lean towards no

125

17:49

<jnewbery> amiti: I agree. If you're touching most of a function and you're going to end up changing it to use the new code style, you might as well do it in the first commit and get it out of the way

126

17:49

<amiti> also I've spent way too long debugging how the code broke when I just added logging, turned out it was one of those one-line-if-statements without brackets. so small stuff like this can matter.

127

17:50

<sipa> ifs with missing braces is one of the (few) actual known examples of coding style that has caused real world bugs

128

17:50

<troygiorshev> sipa: ah cool!

129

17:50

<sipa> in openssl, of course

130

17:50

<troygiorshev> 10 minutes left, last question!

131

17:51

<troygiorshev> Both jnewbery and practicalswift proposed extensions to this message logging. What are they? Do you agree that these are a good idea? Are there any extensions that you would like to see?

132

17:51

<troygiorshev> there are definitly no wrong answers to this one!

133

17:52

<jnewbery> sipa: if it's the one I'm thinking of, I think it was apple's ssl implementation that had the if braces bug

134

17:52

<troygiorshev> https://github.com/bitcoin/bitcoin/pull/19509#pullrequestreview-447894982

135

17:52

<troygiorshev> https://github.com/bitcoin/bitcoin/pull/19509#pullrequestreview-448269477

136

17:52

<michaelfolkson> Right so jnewbery only wants to log messages for a specific peer. To save space, make it cleaner I'm assuming

137

17:52

<sipa> jnewbery: you're right!

138

17:53

<ajonas> I love the idea of the building a fuzzing corpus

139

17:54

<jnewbery> ajonas: I agree. It's a great idea

140

17:54

<jnewbery> michaelfolkson: yes, if I want to debug interactions with an individual peer, I don't want to dump potentially 100s of MBs of traffic from other peers

141

17:55

<jnewbery> especially if I want to leave this running for long periods

142

17:55

<michaelfolkson> Makes sense. Both good suggestions

143

17:55

<troygiorshev> anyone tried enabling this over an entire ibd? how large were the files?

144

17:56

<felixweis> this works during IBD? could be interesting to visualize peer round-robin

145

17:56

<jnewbery> it'd be >300GB since you'd be dumping the entire blockchain in net serialized format

146

17:56

<sipa> one use case i have is that when investigating the effect of some P2P change, i want to look at the exact transcript... usually not at the level of seeing all the messages, but you do want to see it per peer; using debug.log is really annoying as peer ids get reused across restarts, for example

147

17:57

<sipa> so having a file per peer solves that pretty elegantly

148

17:57

<ajonas> I did IBD on testnet with it on

149

17:58

<ajonas> the files were big, the python script just seemed to hang

150

17:58

<sipa> if i can ask: what is the rationale for splitting received/sent messages up into two files?

151

17:58

<felixweis> maybe a flag to limit to certain message types

152

17:58

<sipa> (maybe this was addressed in the PR, just tell me so if i'm missing context)

153

17:59

<jonatack> https://github.com/bitcoin/bitcoin/pull/19509#issuecomment-664374225

154

17:59

<troygiorshev> sipa: 1: saves a byte per message, 2: hypothetically someone might want to just see received or sent

155

17:59

<troygiorshev> one minute left!

156

18:00

<jnewbery> it means we don't need to a field to indicate whether the message was outgoing or incoming. The parser can take any number of files as input, so you can use both together if you want the two-way transcript

157

18:00

<troygiorshev> ok that's time! didn't get to the bonus question but anyone interested in python can find it themselves :)

158

18:00

<felixweis> hm maybe interweaving send/receive isn't that bad because ofthen things are request/respongs (inv/getdata, ...)

159

18:01

<sipa> troygiorshev: is there a way to interleave them again?

160

18:01

<sipa> having to look at them separately would make it very painful to use

161

18:01

<troygiorshev> sipa: yup! just giving both files to the parser interleaves them

162

18:01

<troygiorshev> it's the intended use :)

163

18:01

<sipa> ah great

164

18:01

<lightlike> can we reconstruct the time order in which messages were processed in our node, even with the splitting in two files?

165

18:01

<troygiorshev> thanks to everyone for coming and thanks for all of the great discussion!

166

18:02

<jnewbery> lightlike: yes, each message has a timestamp of sent/received time

167

18:02

<felixweis> thans troygiorshev

168

18:02

<lightlike> thanks

169

18:02

<jnewbery> so feeding them both into the parser will interleave them in order

170

18:02

<michaelfolkson> Thanks troygiorshev!

171

18:03

<jnewbery> the idea is that the parser is just one potential tool to analyze these files. If you want to do more quantitive analysis of the traffic, you could build a tool that parses/analyzes them in a different way.

172

18:03

<troygiorshev> #endmeeting