How to use logging to spot bugs
Let’s start with a trivial but useful method for understanding what a software program does while executing – logging.
Logging is the process of keeping a log of events that occur in a program, storing information by using messages to record how a program executes, tracking its flow, and helping with identifying issues and bugs.
Most Unix-based logging systems use the standard protocol, syslog, created by Eric Altman back in 1980 as part of the Sendmail project. This standard protocol defines the boundaries between the software generating the log messages, the system storing them, and the software reporting and analyzing these log events.
Each log message includes a facility code and a severity level. The facility code identifies the type of system that originated a specific log message (user-level, kernel, system, network, etc.), and the severity level describes the condition of the system, indicating the urgency of dealing with a specific issue, the severity levels being emergency, alert, critical, error, warning, notice, info, and debug.
Most logging systems or loggers provide various destinations or sinks for log messages: console, files that can later be opened and analyzed, remote syslog servers, or relays, among other destinations.
Logging is useful where debuggers are not, as we will see later, especially in distributed, multithreaded, real-time, scientific, or event-centric applications, where inspecting data or following the program flow using the debugger can become a tedious task.
Logging libraries usually also provide a thread-safe singleton class that allows multithreading and asynchronous writing to log files, helps with log rotation, avoids large log files by creating new ones on the fly without losing log events, and time stamping, for better tracking when a log event happens.
Instead of implementing our own multithreaded logging system, a better approach is to use some well-tested and documented production-ready libraries.
How to select a third-party library
When selecting a logging library (or any other library), we need to investigate the following points before integrating it into our software to avoid future issues:
- Support: Is the library updated and upgraded regularly? Is there a community or active ecosystem around the library that can help with any questions that can arise? Is the community happy using the library?
- Quality: Is there a public bugs report system? Are bug reports dealt with promptly, providing solutions and fixing bugs in the library? Does it support recent compiler versions and support latest C++ features?
- Security: Does the library, or any of its dependent libraries, have any vulnerabilities reported?
- License: Is the library license aligned with our development and product needs? Is the cost affordable?
For complex systems, it may be worth considering centralized systems to collect and generate logging reports or dashboards, such as Sentry (https://sentry.io) or Logstash (https://www.elastic.co/logstash), that can collect, parse, and transform logs, and can be integrated with other tools, such as Graylog (https://graylog.org), Grafana (https://grafana.com), or Kibana (https://www.elastic.co/kibana).
The next section describes some interesting logging libraries.
Some relevant logging libraries
There are many logging libraries in the market, each covering some specific software requirements. Depending on the program constraints and needs, one of the following libraries might be more suitable than others.
In Chapter 9, we explored Boost.Asio. Boost also provides another library, Boost.Log (https://github.com/boostorg/log), a powerful and configurable logging library.
Google also provides many open source libraries, including glog, the Google logging library (https://github.com/google/glog), which is a C++14 library that provides C++-style streams APIs and helper macros.
If the developer is familiar with Java, an excellent choice could be Apache Log4cxx (https://logging.apache.org/log4cxx), based on Log4j (https://logging.apache.org/log4j), a versatile, industrial-grade, Java logging framework.
Other logging libraries worth considering are as follows:
- spdlog (https://github.com/gabime/spdlog) is an interesting logging library that we can use with the
{fmt}
library. Also, the program can start logging messages and queuing them since startup, even before the log output file name is specified. - Quill (https://github.com/odygrd/quill) is an asynchronous low-latency C++ logging library.
- NanoLog (https://github.com/PlatformLab/NanoLog) is a nanosecond scale logging system with
printf
-like APIs. - lwlog (https://github.com/ChristianPanov/lwlog) is an amazingly fast asynchronous C++17 logging library.
- XTR (https://github.com/choll/xtr) is a fast and convenient C++ logging library for low-latency and real-time environments.
- Reckless (https://github.com/mattiasflodin/reckless) is a low-latency and high-throughput logging library.
- uberlog (https://github.com/IMQS/uberlog) is a cross-platform and multi-process C++ logging system.
- Easylogging++ (https://github.com/abumq/easyloggingpp) is a single-header C++ logging library with the ability to write our own sinks and track performance.
- tracetool (https://github.com/froglogic/tracetool) is a logging and tracing shared library.
As a guideline, depending on the system to develop, we might choose one of the following libraries:
- For low-latency or real-time systems: Quill, XTR, or Reckless
- For high performance at nanosecond scale logging: NanoLog
- For asynchronous logging: Quill or
lwlog
- For cross-platform, multi-process applications:
uberlog
- For simple and flexible logging: Easylogging++ or glog
- For familiarity with Java logging: Log4cxx
All libraries have advantages but also disadvantages that need to be investigated prior to selecting a library to include in your system. The following table summarizes these points:
Library |
Advantages |
Disadvantages |
|
Easy integration, performance-focused, customizable |
Lacks some advanced features for extreme low-latency needs |
Quill |
High performance in low-latency systems |
More complex setup compared to simpler, synchronous loggers |
NanoLog |
Best in class for speed, optimized for performance |
Limited in features; suited for specialized use cases |
|
Lightweight, good for quick integration |
Less mature and feature-rich than alternatives |
XTR |
Very efficient, user-friendly interface |
More suited for specific real-time applications |
Reckless |
Highly optimized for throughput and low latency |
Limited flexibility compared to more general-purpose loggers |
|
Great for multi-process and distributed systems |
Not as fast as specialized low-latency loggers |
Easylogging++ |
Easy to use, customizable output sinks |
Less performance-optimized than some other libraries |
|
Combines logging and tracing in one library |
Not focused on low-latency or high-throughput |
Boost.Log |
Versatile, integrates well with Boost libraries |
Higher complexity; can be overkill for simple logging needs |
glog |
Simple to use, good for projects requiring easy APIs |
Not as feature-rich for advanced customization |
Log4cxx |
Robust, time-tested, industrial-strength logging |
More complex to set up, especially for smaller projects |
Table 11.1: Advantages and disadvantages of various libraries
Please visit the logging libraries’ websites to understand better what features they provide and compare performance between them.
As spdlog
is the most forked and starred C++ logging library repository in GitHub, in the next section, we will implement an example of using this library to catch a race condition.
Logging a deadlock – an example
Before implementing this example, we need to install the spdlog
and {fmt}
libraries. {fmt}
(https://github.com/fmtlib/fmt) is an open source formatting library providing a fast and safe alternative to C++ IOStreams.
Please check their documentation and follow the installation steps depending on your platform.
Let’s implement an example where a deadlock is happening. As we learned in Chapter 4, a deadlock can happen when two or more threads need to acquire more than one mutex to perform their work. If mutexes are not acquired in the same order, a thread can acquire a mutex and wait forever for another mutex acquired by another thread.
In this example, two threads need to acquire two mutexes, mtx1
and mtx2
, to increase the value of the counter1
and counter2
counters and swap their values. As the mutexes are acquired in different order by the threads, a deadlock can happen.
Let’s start by including the required libraries:
#include <fmt/core.h> #include <spdlog/sinks/basic_file_sink.h> #include <spdlog/sinks/stdout_color_sinks.h> #include <spdlog/spdlog.h> #include <chrono> #include <iostream> #include <mutex> #include <thread> using namespace std::chrono_literals;
In the main()
function, we define the counters and mutexes:
uint32_t counter1{}; std::mutex mtx1; uint32_t counter2{}; std::mutex mtx2;
Before spawning the threads, let’s set up a multi-sink logger, a logger that can write log messages into the console and a log file simultaneously. We will also set up its log level to debug, making the logger publish all log messages with a severity level greater than debug, and the format for each log line consisting of the timestamp, the thread identifier, the log level, and the log message:
auto console_sink = std::make_shared< spdlog::sinks::stdout_color_sink_mt>(); console_sink->set_level(spdlog::level::debug); auto file_sink = std::make_shared< spdlog::sinks::basic_file_sink_mt>("logging.log", true); file_sink->set_level(spdlog::level::info); spdlog::logger logger("multi_sink", {console_sink, file_sink}); logger.set_pattern( "%Y-%m-%d %H:%M:%S.%f - Thread %t [%l] : %v"); logger.set_level(spdlog::level::debug);
We also declare an increase_and_swap
lambda function that increases the values of both counters and swaps them:
auto increase_and_swap = [&]() { logger.info("Incrementing both counters..."); counter1++; counter2++; logger.info("Swapping counters..."); std::swap(counter1, counter2); };
Two worker lambda functions, worker1
and worker2
, acquire both mutexes and call increase_and_swap()
before exiting. As lock guard (std::lock_guard
) objects are used, the mutexes are released when leaving the worker lambda functions during their destruction:
auto worker1 = [&]() { logger.debug("Entering worker1"); logger.info("Locking mtx1..."); std::lock_guard<std::mutex> lock1(mtx1); logger.info("Mutex mtx1 locked"); std::this_thread::sleep_for(100ms); logger.info("Locking mtx2..."); std::lock_guard<std::mutex> lock2(mtx2); logger.info("Mutex mtx2 locked"); increase_and_swap(); logger.debug("Leaving worker1"); }; auto worker2 = [&]() { logger.debug("Entering worker2"); logger.info("Locking mtx2..."); std::lock_guard<std::mutex> lock2(mtx2); logger.info("Mutex mtx2 locked"); std::this_thread::sleep_for(100ms); logger.info("Locking mtx1..."); std::lock_guard<std::mutex> lock1(mtx1); logger.info("Mutex mtx1 locked"); increase_and_swap(); logger.debug("Leaving worker2"); }; logger.debug("Starting main function..."); std::thread t1(worker1); std::thread t2(worker2); t1.join(); t2.join();
Both worker lambda functions are similar but with a small difference: worker1
acquires mutex1
and then mutex2
, and worker2
follows the opposite order, first acquiring mutex2
and then mutex1
. There is a sleep period between both mutexes’ acquisition to let the other thread acquire its mutex, therefore, provoking a deadlock as worker1
will acquire mutex1
and worker2
will acquire mutex2
.
Then, after sleeping, worker1
will try to acquire mutex2
and worker2
will try the same with mutex1
, but none of them will succeed, blocking forever in a deadlock.
The following is the output when running this code:
2024-09-04 23:39:54.484005 - Thread 38984 [debug] : Starting main function... 2024-09-04 23:39:54.484106 - Thread 38985 [debug] : Entering worker1 2024-09-04 23:39:54.484116 - Thread 38985 [info] : Locking mtx1... 2024-09-04 23:39:54.484136 - Thread 38986 [debug] : Entering worker2 2024-09-04 23:39:54.484151 - Thread 38986 [info] : Locking mtx2... 2024-09-04 23:39:54.484160 - Thread 38986 [info] : Mutex mtx2 locked 2024-09-04 23:39:54.484146 - Thread 38985 [info] : Mutex mtx1 locked 2024-09-04 23:39:54.584250 - Thread 38986 [info] : Locking mtx1... 2024-09-04 23:39:54.584255 - Thread 38985 [info] : Locking mtx2...
The first symptom to note when inspecting the logs is that the program never finishes and therefore probably is deadlocked.
From the logger output, we can see that t1
(thread 38985
) is running worker1
and t2
(thread 38986
) is running worker2
. As soon as t1
enters worker1
, it acquires mtx1
. The mtx2
mutex is acquired by t2
though, as soon as worker2
starts. Then, both threads wait for 100 ms and try to acquire the other mutex, but none succeed, and the program remains blocked.
Logging is indispensable in production systems but imposes some performance penalty if abused, and most of the time requires human intervention to investigate an issue. As a compromise between log verbosity and performance penalty, one might choose to implement different logging levels and log only major events during normal operation, while still retaining the ability to provide extremely detailed logs if opted for, when the situation needs it. A more automated way to detect errors in code early in the development cycle is by using testing and code sanitizers, which we will learn about in the next chapter.
Not all bugs can be detected, so usually using a debugger is the way to track down and fix bugs in software. Let’s learn next how to debug multithreading and asynchronous code.