Rotate the debug log file (utils/log/libs)

https://github.com/bitcoin/bitcoin/pull/22350

Host: larryruane  -  PR author: larryruane

The PR branch HEAD was ea728a306 at the time of this review club meeting.

This week, we’ll look at PR 22350 that proposes to rotate the debug.log file on disk. Four weeks ago, we reviewed PR 21603; this PR is an alternative.

This time we’re going to try something different – LarryRuane will host a video chat at https://meet.jit.si/bitcoin-core-pr-reviews-22350 (recording, notes) and all are invited to join! We’ll also use IRC as usual (we’ll see how this goes).

The purpose of the video meeting is that Larry will share his display and show how one might use the gdb debugger as an aid to reviewing this or any PR. The session will be recorded, and the link to the recording sent out with the IRC log after the meeting.

Bitcoin Core developer Fabian Jahr has provided an excellent document and video on using the system debugger and much more. Larry will more narrowly focus on using gdb effectively.

Notes

  • Please review to the Notes for the PR 21603 review for background information about logging.

  • Many operating systems solve the logging growth problem by automatically rotating their system logs; run ls /var/log on any Linux system and you’ll likely see several different kinds of log files with numbers as part of their filenames; these are “rotated” log files – older versions of the main log file. Some of these are also compressed.

$ ls /var/log
alternatives.log  boot.log       dpkg.log.2.gz      fail2ban.log.3.gz  lastlog           log2ram.log                 mynode.log              rtl.log.2                       unlock_lnd.log
apt               bootstrap.log  dpkg.log.3.gz      fail2ban.log.4.gz  lit.log           loopd.log                   mynode.log.1            syslog                          unlock_lnd.log.1
auth.log          btmp           dpkg.log.4.gz      faillog            lit.log.1         loopd.log.1                 mynode.log.2            syslog.1                        unlock_lnd.log.2
auth.log.1        btmp.1         dpkg.log.5.gz      flask              lit.log.2         loopd.log.2                 mynode_quicksync.log    sysstat                         upgrade.log
auth.log.2        daemon.log     dpkg.log.6.gz      flask.1            lnd_backup.log    loop.log                    mynode_quicksync.log.1  tor                             user.log
bitcoind.log      daemon.log.1   electrs.log        flask.2            lnd_backup.log.1  messages                    mynode_quicksync.log.2  ufw.log                         user.log.1
bitcoind.log.1    debug          electrs.log.1      fontconfig.log     lnd_backup.log.2  messages.1                  nginx                   ufw.log.1                       user.log.2
bitcoind.log.2    debug.1        electrs.log.2      gunicorn           lndconnect.log    messages.2                  private                 ufw.log.1.gz-2021060600.backup  wtmp
bitcoin.log       debug.2        fail2ban.log       kern.log           lnd.log           mynode_docker_images.log    redis                   ufw.log.2.gz                    www.log
bitcoin.log.1     dpkg.log       fail2ban.log.1     kern.log.1         lnd.log.1         mynode_docker_images.log.1  rtl.log                 ufw.log.3.gz                    www.log.1
bitcoin.log.2     dpkg.log.1     fail2ban.log.2.gz  kern.log.2         lnd.log.2         mynode_docker_images.log.2  rtl.log.1               ufw.log.4.gz                    www.log.2
  • Linux and other Unix-like systems include a tool specialized to this exact task, logrotate. It’s usually configured to run periodically (daily, for example) using the built-in cron facility.

  • The logrotate command is difficult to configure (run man logrotate), knowledge of cron is also required, and is not available on Windows or on some stripped-down Linux distributions.

  • For that reason, some programs do their own internal log rotation. The concept is, when a log file reaches a certain size, the system that “owns” the log file renames it to an alternate name (usually by including a sequence number), and resets the main log file to empty. Usually this occurs as a side-effect of writing a log message.

  • Different things can be done with the rotated log files. For example:

    • Retain only a configurable number of the most recent log files, deleting older files.
    • Compress and save the rotated log files for a configurable time period.
    • Upload the log files to a remote system that can archive them (store them cheaply).
  • Many commercial data storage systems do internal log file rotation (examples here and here).

  • While Bitcoin Core’s debug.log file grows without limit for as long as bitcoind runs, it does do one kind of trimming: When bitcoind starts up, it checks the length of debug.log; if it’s greater than 11 MB, it shrinks it down to the most recent 10 MB of logging data.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. Why does bitcoind include the concept of a log file? It uses disk space; what’s the advantage? Have you needed to look at debug.log; has it been useful?

  3. What’s the connection with denial-of-service attacks?

  4. If we receive an invalid transaction, why don’t we log that this happened? Should we?

  5. PR 21603 uses a different approach, log rate limiting. (Quick summary: Parts of the code that are generating too much logging too quickly are throttled; not all their log messages are saved, although this limiting itself is logged so readers of the log are aware that it’s happened.) What are the advantages and disadvantages of these two approaches? Which do you think would result in more useful log output?

  6. Other log rotation systems append a sequence number to rotated logs (for example, on Linux systems, in /var/log, you see syslog.1, syslog.2, … This PR names the rotated log files by date, such as debug-2021-06-27T21:19:42Z.log. What are the trade-offs? Which naming style do you prefer?

  7. Should this PR also compress rotated log files?

  8. Is this PR even necessary, given that the logrotate facility exists? What are the trade-offs?

  9. On Linux systems, logrotate is located in /usr/sbin. What kinds of programs are stored there, as opposed to the more standard /usr/bin?

Video

Larry has recorded a video tutorial on using GDB to accompany this Review Club meeting, with accompanying notes.

Meeting Log

  117:00 <jnewbery> #startmeeting
  217:00 <paving> hi
  317:00 <glozow> hi
  417:00 <neha> hi
  517:00 <schmidty> hi
  617:00 <dokkodo> hi!
  717:00 <emzy> hi
  817:00 <lightlike> hi
  917:01 <jnewbery> hi folks. Welcome to PR review club!
 1017:01 <michaelfolkson> hi
 1117:01 <LarryRuane> hi
 1217:01 <bogosort> hi
 1317:01 <jnewbery> Feel free to say hi to let people know you're here (or not, lurkers also welcome!)
 1417:01 <Naiza> hi
 1517:01 <jnewbery> is anyone here for the first time?
 1617:01 <bogosort> me
 1717:01 <jnewbery> welcome bogosort!
 1817:01 <glozow> hi Naiza!!
 1917:01 <bogosort> thanks!
 2017:02 <Naiza> yes, me
 2117:02 <jnewbery> hi Naiza. Welcome :)
 2217:02 <dokkodo> I am as well
 2317:02 <Naiza> Hey! thanks a lot!
 2417:02 <jnewbery> Today's host is LarryRuane. Over to you, Larry
 2517:03 <glozow> waow! welcome Naiza bogosort dokkodo
 2617:03 <jnewbery> welcome dokkodo!
 2717:03 <LarryRuane> Thanks, John! The notes for today are here: https://bitcoincore.reviews/22350
 2817:03 <dokkodo> Thanks! I'm excited to be here :)
 2917:04 <LarryRuane> and today we're going to try something new, we're going to review a PR -- AND -- have me do a little hands-on demo of using gdb (debugger) to review the PR (well, any PR)
 3017:04 <LarryRuane> so I thought we'd start out as usual, discussion, questions, for about 30 minutes, then we can switch over to a Jitsi meeting room,
 3117:05 <LarryRuane> and there I'll share my screen, and go over some tips and tricks to using gdb ... with an eye especially to using gdb to review PRs
 3217:05 <b10c> hi!
 3317:05 <LarryRuane> There's a link in the notes for the meeting room ... Hi b10c!
 3417:06 <LarryRuane> I guess we'll start out as we usually do, who here had a chance to review the PR, what was your review approach? I like to always know also: Is there sufficient automated testing?
 3517:07 <emzy> n
 3617:07 <jnewbery> concept review y
 3717:07 <glozow> n
 3817:08 <LarryRuane> The one thing about this PR is it's on the simpler side, just so we could limit discussion to 30 minutes instead of an hour
 3917:08 <dokkodo> note y, pr n
 4017:08 <michaelfolkson> Just notes and previous PR review club session
 4117:09 <michaelfolkson> (on competing PRs)
 4217:09 <jnewbery> I don't think they need to be called competing. They could be considered independently.
 4317:10 <LarryRuane> Okay cool, so just to summarize the notes, for those who didn't have a chance to read them
 4417:10 <LarryRuane> You're probably all familiar with `debug.log` in the data directory
 4517:10 <paving> concept: yes. Code: little bit
 4617:10 <LarryRuane> Which program writes to `debug.log`?
 4717:11 <michaelfolkson> jnewbery: Oh so it is possible 2 of the 3 could be merged? I just assumed only 1 of the 3 would
 4817:11 <glozow> bitcoind?
 4917:11 <jnewbery> or bitcoin-qt :)
 5017:12 <emzy> and not bitcoin-cli
 5117:12 <jnewbery> michaelfolkson: sure. They're doing different things
 5217:12 <LarryRuane> jnewbery glozow emzy Yes, all correct ... why does it do that? Have this file been useful to you? If so, why?
 5317:13 <LarryRuane> I think we can talk about the relation of this PR to the others a bit later, once people are up to speed on this PR
 5417:13 <emzy> I use it all the time. To look if bitcoind is running right.
 5517:14 <jnewbery> LarryRuane: it's been very useful to me. I use the debug logs every day :)
 5617:14 <glozow> same, i can't remember the CLI commands so i just tail debug.log
 5717:14 <glozow> very exciting to see new tips
 5817:15 <emzy> I also like the debug.log more than requesting status via bitcoin-cli
 5917:15 <LarryRuane> Okay, so it's used to print messages that indicate what it's doing, what problems the node may be having ... stuff that's useful for debugging (hence the name), but also just for monitoring
 6017:15 <LarryRuane> What possible connection is there between this file and DDOS (distributed denial of service, or just DOS)?
 6117:15 <dokkodo> logging.cpp was referenced as the current strategy for shrinking the log
 6217:16 <glozow> well the logs take up space on our computer, and it takes time to write them
 6317:16 <paving> resource exhaustion could occur by a remote peer grieffing the node
 6417:17 <LarryRuane> paving Right, that's a great term, grieffing! (Hadn't heard that before) ... which resource?
 6517:17 <paving> disk
 6617:18 <jnewbery> disk space or (unlikely) disk i/o
 6717:18 <LarryRuane> yes! And notice that it might not be an intentional DDOS, there could be a bug that causes repeated maybe infinite logging, which would quickly fill up disk space, causing the node to crash
 6817:19 <LarryRuane> if this were the result of (say) a transaction (or a block), then every bitcoin core node in the world would crash! This would be quite horrible :)
 6917:19 <paving> jnewbery good point about io
 7017:19 <paving> and being specific about about which specific disk resource is being affected
 7117:20 <LarryRuane> So the way `debug.log` is managed today, it grows forever as long as `bitcoind` runs ... dokkodo mentioned shrinking, when does this happen?
 7217:20 <LarryRuane> (hint this is in the notes)
 7317:20 <emzy> Only at restarts.
 7417:20 <paving> lnd has similar log segmenting, their logs have/had? the potential to grow exceptionally large
 7517:21 <dokkodo> when it exceeds 10MB
 7617:21 <dokkodo> or whoops, 11
 7717:21 <dokkodo> (just double checked, haha)
 7817:21 <LarryRuane> yes, exactly
 7917:23 <michaelfolkson> paving: Interesting, it does seem like a common problem. Either have verbose logs and they grow too large or cut back and not get logging that might be useful
 8017:23 <LarryRuane> There's a PR dokkodo (i think) that limits logging in one way, this PR uses a different approach (rotation), see https://bitcoincore.reviews/21603 and question 5
 8117:24 <paving> michaelfolkson yeah certainly so, hence logrotate tools etc
 8217:24 <LarryRuane> (question 5 in the current notes)
 8317:24 <glozow> how much logging is expected? like 1MB per week? 1MB per hour?
 8417:24 <paving> glozow pertinent question!
 8517:25 <LarryRuane> paving Yes, logrotate, this PR does what that tool does internally (well, `logrotate` is a swiss army knife)
 8617:26 <jnewbery> glozow: very much depends how much logging you have enabled. The default (everything that's logPrintf() in the code) is not very verbose. If you start with --debug=all you'll get some very verbose categories like net, libevent and leveldb.
 8717:26 <LarryRuane> Without any extra logging enabled, I've seen tons of logging during syncing (especially IBD of course), but during normal operation, only some P2P messages and the UpdateTip messages, so not really very much
 8817:27 <LarryRuane> But I think the fear is that something could cause a large amount of unexpected logging
 8917:27 <lightlike> in IBD, then also the default can be quite verbose
 9017:28 <LarryRuane> Also think about a raspberry-pi with limited disk space, and the user just sets it and forgets it ... even the normal logging could be enough to fill the disk
 9117:28 <glozow> so in some sense, as a safety net, never letting the log grow beyond a certain size helps
 9217:28 <paving> glozow jnewbery LarryRuane I just checked a node, ~13 days of up time, debug is 23mb in size
 9317:28 <LarryRuane> paving good to know, thanks
 9417:30 <LarryRuane> All the commercial systems (like data storage) that I'm aware of do internal log rotation, or use `logrotate`
 9517:30 <glozow> would that be an extra dependency?
 9617:31 <LarryRuane> Yes I think that's one of the downsides to using `logrotate` plus the fact that it's complicated to configure (I did that recently for our `debug.log` file and it took me a couple of hours to figure it out, but maybe that's just me :) )
 9717:32 <LarryRuane> Okay I think everyone's got a good background on this PR, shall we move to the video chat now?
 9817:32 <dokkodo> SGTM
 9917:33 <michaelfolkson> Wow us with your gdb ninja
10017:33 <paving> deque and fs.h I see are added
10117:34 <b10c> compressing rotated logs should be helpful as the log messages are quite repetitive (I haven't checked your PR if you add it)
10217:35 <paving> yeah lnd gzip's their
10317:35 <paving> is LarryRuane broadcasting content on jitsi now? I don't see or hear anything
10417:36 <jnewbery> paving: yes, it's happening at https://meet.jit.si/bitcoin-core-pr-reviews-22350
10517:36 <jnewbery> paving: try refreshing the page if you don't see anything. jitsi can sometimes be a bit flakey
10617:37 <paving> jnewbery thanks it's probably on my browser's end
10717:43 <paving> I'll download it later, have fun all. thanks Larry for hosting
10818:01 <jnewbery> thanks Larry. I've got to run now, but that was great!
10918:03 <bogosort> thanks Larry!
11018:03 <glozow> thank you Larry!!
11118:04 <dokkodo> thanks. i enjoyed this very much!
11218:04 <svav> Thanks Larry!
11318:04 <jnewbery> #endmeeting