Bitcoin Core can log events either to stdout or to a log file on disk (or
both). There are various configuration
options
to control the logging, including:
-debuglogfile to specify the file to log to. Setting -nodebuglogfile
disables logging to file entirely.
-debug to enable low-level debug logging. Can be specified as
-debug=<comma-separated list of categories> to only log certain
categories. Note that even with -nodebug, Bitcoin Core will still log
important events to file by default.
-logips, -logtimestamps, -logthreadnames, -logsourcelocations and
-logtimemicros all specify additional information that can be added to
each log line.
-printtoconsole controls whether the logging is output to stdout as well
as the log file.
We expect most users to run with default options. For logging, that means
that important, high-level logs will be written to file, but low-level
debugging logs will not.
A specific log in the code can be assigned to one of several different
categories.
Those categories include things like “net” for events in the networking and p2p code,
“validation” for events in the validation code, and so on.
We use two macros for logging:
LogPrintf() logs unconditionally (unless -nodebuglogfile and
-noprinttoconsole are set).
LogPrint() logs if the specified category is enabled. For example,
LogPrint(BCLog::NET, "<log message>"); will only log if the ‘net’
category is enabled.
One possible class of denial of service attacks is for an adversary to
cause a victim node to repeatedly write logs to disk. This would eventually
cause the victim’s disk to fill up, and the node to fail.
For that reason, we want the unconditional LogPrintf() macro to be used
sparingly, and only in places where an attacker can’t easily trigger a log
being written.
More generally, it would be good if there was a mechanism that could
eliminate this entire class of potential exploits by throttling logging
whenever it exceeds a certain rate. Issue
21559 documents the potential
exploit and suggested fix.
PRs 21603 and
21706 are two alternative
approaches. One implements rate-limiting per log location, and one implements
rate-limiting globally. Both were implemented by the same author, who is
seeking conceptual/approach review.
Try enabling logging for all categories and starting Bitcoin Core (eg with
bitcoind -debug=all). Is the output useful? Are there specific categories
which generate a lot of logs?
The two approaches are to rate-limit logs per-location and globally. What
are the advantages and disadvantages of those two approaches? Which do
you think would result in more useful log output?
Both PRs add a new LogPrintfWithoutRateLimiting() macro. What is this
used for? Why is it important?
Both PRs add new variables m_bytes_dropped and m_messages_dropped. How
are these variables used? How would they be useful for a developer trying
to debug an issue?
<hiii> i feel honored to chat with u, hi again amiti.. where can I ask about summerofbitcoin related questions? as in when is it starting, etc. If there is an offtopic channel please let me know
<jnewbery> I have a series of questions to guide the conversation, but feel free to jump in at any point and ask your own questions if anything is unclear. We're all here to learn, so questions are very welcome.
<svav> In general, how big a problem do people think disk filling attacks could be at present? How was it established that this is a serious attack surface?
<lightlike> since this is not a bitcoin-specific attack vector: are there any famous examples where some kind of logging DOS attack was executed and caused damage?
<jnewbery> I think logrotate is probably the best solution, but we can't expect all users to configure logrotate if they just want to start a bitcoin core node
<michaelfolkson> Do people here actively monitor and manage the size of their logs? I can imagine they get pretty big on most verbose logging even without an attack
<LarryRuane> what's interesting is, barring logging bugs (that these two PRs are addressing), the logging should be linear with the size of the blockchain (the UpdateTip messages)
<jnewbery> I believe the motivation here is to eliminate the possibility of a non-category log (ie a logPrint(...) rather than a logPrintf(<category>, ...)) from completely filling up the disk
<jnewbery> I think if you're running with -debug=all or -debug=spammy_component, then you're already an advanced user and you should be able to manage you disk
<amiti> yeah I saw the categories being counted & maintained (esp for the PR that locally supresses), but also looks like its bounded by the same rate across the board?
<LarryRuane> amiti: "so we don't unnecessary evaluate expressions" -- that makes sense, do we test that the logging expressions don't crash the system? I think possibly that the functional tests run with -debug
<michaelfolkson> dergoegge: But people using -debug are a more attractive target :) Their logs are already huge (assuming they aren't cleaning disk regularly)
<FelixWeis> Im using debug=mempool in a project to get ATMP and replacement timings. If I didn't read the changelog when upgrading I might miss out on valuable statistics.
<svav> So we use a macro instead of a function for conditional logging so arguments are not evaluated when logging for the category is not enabled ... ?
<jnewbery> 3. Try enabling logging for all categories and starting Bitcoin Core (eg with bitcoind -debug=all). Is the output useful? Are there specific categories which generate a lot of logs?
<jnewbery> good! There are lots of config options to make the logs more useful: -logips, -logtimestamps, -logthreadnames, -logsourcelocations and -logtimemicros
<jnewbery> using -logthreadnames and looking at the validationinterface callbacks being enqueued and then the callbacks being called is a really good exercise to see how validation processing happens and triggers events in different subsystems
<jnewbery> lightlike: I agree. I'd suggest running some functional tests with -nocleanup, and then running the combine logs parser with -c to colorize the different nodes. Gives you a good view of what the different nodes in the test are doing.
<jnewbery> Next question: The two approaches are to rate-limit logs per-location and globally. What are the advantages and disadvantages of those two approaches? Which do you think would result in more useful log output?
<michaelfolkson> Advantages of global - better cut off logs, better log experience in the case of cut off. Disadvantages of global - we generally want to avoid global due to security, tight coupling
<svav> Locally means only nodes affected by overstimulation of logging would have restrictions applied, so only nodes being impacted would be controlled
<jnewbery> svav: you're misunderstanding how I'm using 'local' and 'global'. 'local' means "only rate-limit the loggin from this line of code". 'global' means "rate-limit logs from across the whole program"
<LarryRuane> overall, I'm surprised how complex both of these PRs are (over 200 lines of diff), given what they do .. but maybe that's just how it has to be
<LarryRuane> maybe a simpler approach, that might be good enough, is to keep track of only the most recent logged message, and if it's repeated (soon), don't log it ... and after some timeout, log "last message repeated 200 times" or whatever
<jnewbery> I'll explain why I suggested global log-shedding. If we only drop logs from a single location, we end up with a log file that looks normal, but some logs are missing. The only way to know that is to search for a very specific "start dropping logs" or "stop dropping logs" log.
<jnewbery> I think partial logging is quite dangerous. I've been tripped up many times debugging issues when the logs are incomplete and I've been sent in the wrong direction. For example, what would you conclude if there was the following code:
<lightlike> to exclude messages from the rate limting that are written a lot during IBD and cannot be abused as an attack vector (because they require miner's work)
<jnewbery> 6. Both PRs add new variables m_bytes_dropped and m_messages_dropped. How are these variables used? How would they be useful for a developer trying to debug an issue?
<dergoegge> they are used to keep track of how many messages and how many bytes were dropped. we can use this to decide if logging should be resumed or print a tally at the end of a suppression period
<prakash> Hi..I am new here. I have a question on global vs local. Is global dropping all Logprintf messages if the rate limit kicks in ? If so, would'nt an attacker be able to disable logging altogether before an actual attack?
<jnewbery> prakash: that's exactly what global log-shedding would do. However, I don't think attackers are generally worried about their victim's logs.
<LarryRuane> A general question came up for me during review of these PRs ... if we add a new function, is the convention to be before or after the first (maybe only) call point?
<jnewbery> emzy: maybe you're right. I just can't think of a scenario where someone exploiting a weakness in Bitcoin Core would also need to suppress logs.
<prakash> but i was able to print repeated messages logged in about incorrect rpc password : "ThreadRPCServer incorrect password attempt from 127.0.0.1" ...so technically it would be easy to turn off logging.
<LarryRuane> yes ... this PR adds ComputeQuotaUsageStats(), and it's called from BCLog::Logger::LogPrintStr() ... so is it preferred to add the new function before or after the function that's calling it?