BotGadget: Beyond Printf(): Better Logging Practices for Faster Debugging

All of us who do some programming know that logging is a time-tested way to output messages about the internal state of our code. With varying degrees of adjustable granularity, these messages allow us to keep track of not only the state of the application, but also its overall health. When things do end up going FUBAR, these log messages are usually the first thing we look at as a software equivalent of a Flight Data Recorder in an airplane.

Spending some time and care in not only designing the logging system, but also in deciding what should be logged and at what level, can make our future self appreciate life a lot more. We’re all familiar with the practice of ‘printf-debugging’, where logging is added as part of the usual post-crash autopsy as one tries to figure out what exactly went wrong. It’s one way of doing it, and eventually it works, but we can do much better.

People are lazy, and you’re only going to stick to good logging practices if they are at least as easy if not easier than sprinkling printf() statement throughout the code. Yet at the same time we want to be able to handle things like log levels and newlines without too much extra typing. The most successful logging libraries are built with this

Setting the Context

Not every application calls for the same kind of logging. Consider the context of your project when deciding on the logging that will fit your needs.

Many embedded platforms are highly application-specific. Whether it’s a simple 8-bit MCU inside a washing machine or a powerful SoC-based system, it follows that a microwave’s MCU firmware does not require the same level of logging as a car’s ECU or an industrial PID controller. However, they all benefit from good logging practices, whether those messages are being stored, or only visible when actively debugging the hardware.

For server applications, logging is a pretty central part of operations, as a system administrator will want to read out the logs from a central location for all services that run on the servers for which they are responsible. This implies that these logs should only contain the information that a system administrator might find useful, which is not the same information developers will be looking for.

Desktop platforms are similar, where on most platforms a central logging service will record the standard output from any running services. For regular applications (i.e. not services), any logging output tends to either be a more ephemeral thing. Either the standard output is displayed in a terminal window, or vanishes into the void without a terminal or other output associated with it. For the sake of this article we will assume that we just have to write to stdout without concerning us with any further details.

As a small aside, mobile platforms are a rather special case when it comes to logging. They are more similar to server platforms, in the sense that every application’s standard output is logged. This way one gets the logging for every single service and application that is running, without having to configure anything. The exact details of how to obtain this log on mobile platforms like Android, iOS, Tizen, or ChromeOS will differ per platform.

Use Log Levels Wisely

Why have log levels, you may ask? Because you want to differentiate between the importance of log messages, possibly even filter just for particular types of messages. Common log levels are:

Fatal.
Critical.
Error.
Warning.
Notice.
Info.
Debug.
Trace.

Normally you’d run a service at a log level of at most ‘Info’ level, but if the service is too ‘chatty’ at this log level one might want to bump it down to Notice or even just Warning. The onus is hereby on the developer to properly consider what log level the message they’re about to add to the code should exist at. Is it an informational message? A notice that should be looked at at least once? Or is it a warning that must be addressed?

Obviously, an Error level message is bad news, but we have two more levels beyond that. Here ‘Critical’ level should be used for errors that fall between ‘well, bugger’ and ‘the system just caught on fire’. It follows then that ‘Fatal’ messages involve the service reaching an unrecoverable state, a fatal runtime error, and anything else which might mean that a sysadmin is about to get paged out of bed at 3 AM on a Saturday.

Finally, the ‘Debug’ level is for information that is useful while debugging the code, like values of certain variables and verbose logging of application activity. For truly fine-grained debug output, there is the ‘Trace’ level, which is for tedious information. This is generally used for maximum verbosity logging.

Using Poco::Logger; — A Practical Server-Based Example

Over the years, I have had to add logging to both hobby and commercial projects. Among my favorites was an embedded server project that would run on countless headless systems around the world, with the only feedback being the built-in logging that could be read out with the right software and sent back to HQ. For this I used Poco::Logger from the libPoco library. It takes care of the more tedious stuff, such as herding different logging streams (channels) together and other logistics.

The right logging was essential not just for myself to diagnose problems, but also for colleagues who could use the same logging to figure out where (and why) things had begun to catch on fire by tracking requests and their data. It worked well and and I’ve ended up using the same kind of logging in some of my hobby projects, including NymphRPC, a Remote Procedure Call (RPC) library in C++. The part that I implemented is contained in nymph_logger.h and nymph_logger.cpp. It allows NymphRPC-based applications to write to the log as follows:


NYMPH_LOG_DEBUG("Added new connection with handle: " + NumberFormatter::format(handle));

The NYMPH_LOG_DEBUG () part is a preprocessor macro, which is defined in nymph_logger.h as:

#define NYMPH_LOG_DEBUG(msg) \
        if (NymphLogger::priority >= Poco::Message::PRIO_DEBUG) { \
                NymphLogger::logger(loggerName).debug(msg, __FILE__, __LINE__);\
        }

Here we can see how the line is replaced with an if-statement that checks at which log level the application is supposed to be logging at. The preprocessor statement saves us from typing all of this every single time, while also expanding the __FILE__ and __LINE__ text into the current filename and line in the (unprocessed) source code where the log statement is used. The static NymphLogger::logger() call returns a specific logger, which is an instance of Poco::Logger.

A single NymphLoggerChannel (inherited from Poco::Channel) is provided to Poco::Logger as an output channel. This channel merely passes a formatted string along with the log level to an externally provided function pointer, from where it can be printed to stdout, written to a file, stored in a database, or whatever else is appropriate.

The formatting code:

        string msgStr;
        msgStr = NumberFormatter::format(msg.getPid());
        msgStr += "." + NumberFormatter::format(msg.getTid());
        msgStr += "\t" + msg.getSource() + "\t";
        msgStr += NumberFormatter::format(msg.getSourceLine()) + "\t";
        msgStr += msg.getText() + "\t\t- ";
        msgStr += msg.getSourceFile();
        
        (*loggerFunction)(level, msgStr);

When we then run the NymphRPC test client and server, we can see log output like the following:

6 - 14524.2     NymphSession    130     Read 37 bytes.          - src/nymph_session.cpp
6 - 14524.2     NymphMessage    79      Method ID: 2.           - src/nymph_message.cpp
6 - 14524.2     NymphMessage    93      Message flags: 0x0              - src/nymph_message.cpp
6 - 14524.2     NymphUtilities  170     NYMPH_TYPE_STRING               - src/nymph_utilities.cpp
6 - 14524.2     NymphTypes      306     String value: callbackFunction.         - src/nymph_types.cpp
6 - 14524.2     NymphSession    148     Calling method callback for message ID: 0x3             - src/nymph_session.cpp
6 - 14524.2     NymphMethod     92      Calling callback for method: helloCallbackFunction              - src/nymph_method.cpp

Looking at the test server’s code we can see that the first number is the log level (6: Debug), with the rest being the previously formatted string. This contains the process ID, the thread in this process which logged the message, the name of the logger, the line in the source, the message, and finally the relative path (from the Makefile) to the source file.

Final thoughts

I have found that having to regularly eat one’s own dogfood and dealing with commentary from colleagues and customers about the logging output from one’s code has been highly motivating to polish the rough spots. The one which this article cannot cover is what is important to log in your code. That is something which you, the dev, either knows already or will find out through bitter experience while digging through a log file.

That said, a hobby project is a good place to get started with trying out different logging approaches. Once you find yourself working on a commercial project, it is definitely too late to get creative. On the positive side, once you figure out how to do logging right in different situations, it can turn weekend-long bug hunting sessions into a brief, 10-minute log-reading-code-tweaking and CI-bothering session on a Friday afternoon.

BotGadget

Tuesday, July 21

Beyond Printf(): Better Logging Practices for Faster Debugging

Setting the Context

Use Log Levels Wisely

Using Poco::Logger; — A Practical Server-Based Example

Final thoughts

No comments:

Post a Comment