Introducing Annotated Logger: A Python package to aid in adding metadata to logs
We’re open sourcing Annotated Logger, a Python package that helps make logs searchable with consistent metadata.
What it is
Annotated Logger is a Python package that allows you to decorate functions and classes, which then log when complete and can request a customized logger object, which has additional fields pre-added. GitHub’s Vulnerability Management team created this tool to make it easier to find and filter logs in Splunk.
Why we made it
We have several Python projects that have grown in complexity over the years and have used Splunk to ingest and search those logs. We have always sent our logs in via JSON, which makes it easy to add in extra fields. However, there were a number of fields, like what Git branch was deployed, that we also wanted to send, plus, there were fields, like the CVE name of the vulnerability being processed, that we wanted to add for messages in a given function. Both are possible with the base Python logger, but it’s a lot of manual work repeating the same thing over and over, or building and managing a dictionary of extra fields that are included in every log message.
The Annotated Logger started out as a simple decorator in one of our repositories, but was extracted into a package in its own right as we started to use it in all of our projects. As we’ve continued to use it, its features have grown and been updated.
How and why to use it
Now that I’ve gotten a bit of the backstory out of the way, here’s what it does, why you should use it, and how to configure it for your specific needs. At its simplest, you decorate a function with @annotate_logs()
and it will “just work.” If you’d like to dive right in and poke around, the example folder contains examples that fully exercise the features.
@annotate_logs()
def foo():
return True
>>> foo()
{"created": 1733176439.5067494, "levelname": "DEBUG", "name": "annotated_logger.8fcd85f5-d47f-4925-8d3f-935d45ceeefc", "message": "start", "action": "__main__:foo", "annotated": true}
{"created": 1733176439.506998, "levelname": "INFO", "name": "annotated_logger.8fcd85f5-d47f-4925-8d3f-935d45ceeefc", "message": "success", "action": "__main__:foo", "success": true, "run_time": "0.0", "annotated": true}
True
Here is a more complete example that makes use of a number of the features. Make sure to install the package: pip install annotated-logger
first.
import os
from annotated_logger import AnnotatedLogger
al = AnnotatedLogger(
name="annotated_logger.example",
annotations={"branch": os.environ.get("BRANCH", "unknown-branch")}
)
annotate_logs = al.annotate_logs
@annotate_logs()
def split_username(annotated_logger, username):
annotated_logger.annotate(username=username)
annotated_logger.info("This is a very important message!", extra={"important": True})
return list(username)
>>> split_username("crimsonknave")
{"created": 1733349907.7293086, "levelname": "DEBUG", "name": "annotated_logger.example.c499f318-e54b-4f54-9030-a83607fa8519", "message": "start", "action": "__main__:split_username", "branch": "unknown-branch", "annotated": true}
{"created": 1733349907.7296104, "levelname": "INFO", "name": "annotated_logger.example.c499f318-e54b-4f54-9030-a83607fa8519", "message": "This is a very important message!", "important": true, "action": "__main__:split_username", "branch": "unknown-branch", "username": "crimsonknave", "annotated": true}
{"created": 1733349907.729843, "levelname": "INFO", "name": "annotated_logger.example.c499f318-e54b-4f54-9030-a83607fa8519", "message": "success", "action": "__main__:split_username", "branch": "unknown-branch", "username": "crimsonknave", "success": true, "run_time": "0.0", "count": 12, "annotated": true}
['c', 'r', 'i', 'm', 's', 'o', 'n', 'k', 'n', 'a', 'v', 'e']
>>>
>>> split_username(1)
{"created": 1733349913.719831, "levelname": "DEBUG", "name": "annotated_logger.example.1c354f32-dc76-4a6a-8082-751106213cbd", "message": "start", "action": "__main__:split_username", "branch": "unknown-branch", "annotated": true}
{"created": 1733349913.719936, "levelname": "INFO", "name": "annotated_logger.example.1c354f32-dc76-4a6a-8082-751106213cbd", "message": "This is a very important message!", "important": true, "action": "__main__:split_username", "branch": "unknown-branch", "username": 1, "annotated": true}
{"created": 1733349913.7200255, "levelname": "ERROR", "name": "annotated_logger.example.1c354f32-dc76-4a6a-8082-751106213cbd", "message": "Uncaught Exception in logged function", "exc_info": "Traceback (most recent call last):\n File \"/home/crimsonknave/code/annotated-logger/annotated_logger/__init__.py\", line 758, in wrap_function\n result = wrapped(*new_args, **new_kwargs) # pyright: ignore[reportCallIssue]\n File \"<stdin>\", line 5, in split_username\nTypeError: 'int' object is not iterable", "action": "__main__:split_username", "branch": "unknown-branch", "username": 1, "success": false, "exception_title": "'int' object is not iterable", "annotated": true}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<makefun-gen-0>", line 2, in split_username
File "/home/crimsonknave/code/annotated-logger/annotated_logger/__init__.py", line 758, in wrap_function
result = wrapped(*new_args, **new_kwargs) # pyright: ignore[reportCallIssue]
File "<stdin>", line 5, in split_username
TypeError: 'int' object is not iterable
There are a few things going on in this example. Let’s break it down piece by piece.
- The Annotated Logger requires a small amount of setup to use; specifically, you need to instantiate an instance of the
AnnotatedLogger
class. This class contains all of the configuration for the loggers.- Here we set the name of the logger. (You will need to update the logging config if your name does not start with
annotated_logger
or there will be nothing configured to log your messages.) - We also set a
branch
annotation that will be sent with all log messages.
- Here we set the name of the logger. (You will need to update the logging config if your name does not start with
- After that, we create an alias for the decorator. You don’t have to do this, but I find it’s easier to read than
@al.annotate_logs()
. - Now, we decorate and define our method, but this time we’re going to ask the decorator to provide us with a logger object,
annotated_logger
. Thisannotated_logger
variable can be used just like a standardlogger
object but has some extra features.- This
annotated_logger
argument is added by the decorator before calling the decorated method. - The signature of the decorated method is adjusted so that it does not have an
annotated_logger
parameter (see how it’s called with just name). - There are optional parameters to the decorator that allow type hints to correctly parse the modified signature.
- This
- We make use of one of those features right away by calling the
annotate method
, which will add whatever kwargs we pass to theextra
field of all log messages that use the logger.- Any field added as an annotation will be included in each subsequent log message that uses that logger.
- You can override an annotation by annotating again with the same name
- At last, we send a log message! In this message we also pass in a field that’s only for that log message, in the same way you would when using
logger
. - In the second call, we passed an
int
to the name field andlist
threw an exception.- This exception is logged automatically and then re-raised.
- This makes it much easier to know if/when a method ended (unless the process was killed).
Let’s break down each of the fields in the log message:
Field | Source | Description |
---|---|---|
created | logging |
Standard Logging field. |
levelname | logging |
Standard Logging field. |
name | annotated_logger |
Logger name (set via class instantiation). |
message | logging |
Standard Logging field for log content. |
action | annotated_logger |
Method name the logger was created for. |
branch | AnnotatedLogger() |
Set from the configuration’s branch annotation. |
annotated | annotated_logger |
Boolean indicating if the message was sent via Annotated Logger. |
important | annotated_logger.info |
Annotation set for a specific log message. |
username | annotated_logger.annotate |
Annotation set by user. |
success | annotated_logger |
Indicates if the method completed successfully (True/False). |
run_time | annotated_logger |
Duration of the method execution. |
count | annotated_logger |
Length of the return value (if applicable). |
The success
, run_time
and count
fields are added automatically to the message (“success”) that is logged after a decorated method is completed without an exception being raised.
Under the covers
How it’s implemented
The Annotated Logger interacts with Logging
via two main classes: AnnotatedAdapter
and AnnotatedFilter
. AnnotatedAdapter
is a subclass of logging.LoggerAdapter
and is what all annotated_logger
arguments are instances of. AnnotatedFilter
is a subclass of logging.Filter
and is where the annotations are actually injected into the log messages. As a user outside of config and plugins, the only part of the code you will only interact with are AnnotatedAdapter
in methods and the decorator itself. Each instance of the AnnotatedAdapter
class has an AnnotatedFilter
instance—the AnnotatedAdapter.annotate
method passes those annotations on to the filter where they are stored. When a message is logged, that filter will calculate all the annotations it should have and then update the existing LogRecord
object with those annotations.
Because each invocation of a method gets its own AnnotatedAdapter
object it also has its own AnnotatedFilter
object. This ensures that there is no leaking of annotations from one method call to another.
Type hinting
The Annotated Logger is fully type hinted internally and fully supports type hinting of decorated methods. But a little bit of additional detail is required in the decorator invocation. The annotate_logs
method takes a number of optional arguments. For type hinting, _typing_self
, _typing_requested
, _typing_class
and provided
are relevant. The three arguments that start with _typing
have no impact on the behavior of the decorator and are only used in method signature overrides for type hinting. Setting provided
to True
tells the decorator that the annotated_logger
should not be created and will be provided by the caller (thus the signature shouldn’t be altered).
_typing_self
defaults to True
as that is how most of my code is written. provided
, _typing_class
and _typing_requested
default to False
.
class Example:
@annotate_logs(_typing_requested=True)
def foo(self, annotated_logger):
...
e = Example()
e.foo()
Plugins
There are a number of plugins that come packaged with the Annotated Logger. Plugins allow for the user to hook into two places: when an exception is caught by the decorator and when logging a message. You can create your own plugin by creating a class that defines the filter
and uncaught_exception
methods (or inherits from annotated_logger.plugins.BasePlugin
which provides noop methods for both).
The filter
method of a plugin is called when a message is being logged. Plugins are called in the order they are set in the config. They are called by the AnnotatedFilter object of the AnnotatedAdapter and work like any logging.Filter
. They take a record argument which is a logging.LogRecord
object. They can manipulate that record in any way they want and those modifications will persist. Additionally, just like any logging filter, they can stop a message from being logged by returning False
.
The uncaught_exception
method of a plugin is called when the decorator catches an exception in the decorated method. It takes two arguments, exception
and logger
. The logger
argument is the annotated_logger
for the decorated method. This allows the plugin to annotate the log message stating that there was an uncaught exception that is about to be logged once the plugins have all processed their uncaught_exception
methods.
Here is an example of a simple plugin. The plugin inherits from the BasePlugin
, which isn’t strictly needed here since it implements both filter
and uncaught_exception
, but if it didn’t, inheriting from the BasePlugin means that it would fall back to the default noop methods. The plugin has an init so that it can take and store arguments. The filter
and uncaught_exception
methods will end up with the same result: flagged=True
being set if a word matches. But they do it slightly differently, filter
is called while a given log message is being processed and so the annotation it adds is directly to that record. While uncaught_exception
is called if an exception is raised and not caught during the execution of the decorated method, so it doesn’t have a specific log record to interact with and set an annotation on the logger. The only difference in outcome would be if another plugin emitted a log message during its uncaught_exception
method after FlagWordPlugin
, in that case, the additional log message would also have flagged=True
on it.
from annotated_logger.plugins import BasePlugin
class FlagWordPlugin(BasePlugin):
"""Plugin that flags any log message/exception that contains a word in a list."""
def __init__(self, *wordlist):
"""Save the wordlist."""
self.wordlist = wordlist
def filter(self, record):
"""Add annotation if the message contains words in the wordlist."""
for word in self.wordlist:
if word in record.msg:
record.flagged = True
def uncaught_exception(self, exception, logger):
"""Add annotation if exception title contains words in the wordlist."""
for word in self.wordlist:
if word in str(exception)
logger.annotate(flagged=True)
AnnotatedLogger(plugins=[FlagWordPlugin("danger", "Will Robinson")])
Plugins are stored in a list and the order they are added can matter. The BasePlugin
is always the first plugin in the list; any that are set in configuration are added after it.
When a log message is being sent the filter
methods of each plugin will be called in the order they appear in the list. Because the filter
methods often modify the record directly, one filter can break another if, for example, one filter removed or renamed a field that another filter used. Conversely, one filter could expect another to have added or altered a field before its run and would fail if it was ahead of the other filter. Finally, just like in the logging
module, the filter
method can stop a log from being emitted by returning False. As soon as a filter does so the processing ends and any Plugins later in the list will not have their filter
methods called.
If the decorated method raises an exception that is not caught, then the plugins will again execute in order. The most common interaction is plugins attempting to set/modify the same annotation. The BasePlugin
and RequestsPlugin
both set the exception_title
annotation. Since the BasePlugin
is always first, the title it sets will be overridden. Other interactions would be one plugin setting an annotation before or after another plugin that emits a log message or sends data to a third-party. In both of those cases the order will impact if the annotation is present or not.
Plugins that come with the Annotated Logger:
GitHubActionsPlugin
—Set a level of log messages to also be emitted in actions notation (notice::
).NameAdjusterPlugin
—Add a pre/postfix to a name to avoid collisions in any log processing software (source
is a field in Splunk, but we often include it as a field and it’s just hidden).RemoverPlugin
—Remove a field. Excludepassword
/key
fields and set an object’s attributes to the log if you want or ignore fields liketaskName
that are set when running async, but not sync.NestedRemoverPlugin
—Remove a field no matter how deep in a dictionary it is.RenamerPlugin
—Rename one field to another (don’t likelevelname
and wantlevel
, this is how you do that).RequestsPlugin
—Adds a title and status code to the annotations if the exception inherits fromrequests.exceptions.HTTPError
.RuntimeAnnotationsPlugin
—Sets dynamic annotations.
dictconfig
When adding the Annotated Logger to an existing project, or one that uses other packages that log messages (flask, django, and so on), you can configure all of the Annotated Logger via dictConfig
by supplying a dictConfig
compliant dictionary as the config
argument when initializing the Annotated Logger class. If, instead, you wish to do this yourself you can pass config=False
and reference annotated_logger.DEFAULT_LOGGING_CONFIG
to obtain the config that is used when none is provided and alter/extract as needed.
There is one special case where the Annotated Logger will modify the config passed to it: if there is a filter named annotated_filter
that entry will be replaced with a reference to a filter that is created by the instance of the Annotated Logger that’s being created. This allows any annotations or other options set to be applied to messages that use that filter. You can instead create a filter that uses the AnnotatedFilter class, but it won’t have any of the config the rest of your logs have.
Notes
dictConfig
partly works when merging dictionaries. I have found that some parts of the config are not overwritten, but other parts seem to lose their references. So, I would encourage you to build up a logging config for everything and call it once only. If you pass config
, the Annotated Logger will call logging.config.dictConfig
on your config after it has the option to add/adjust the config.
The logging_config.py
example has a much more detailed breakdown and set of examples.
Pytest mock
Included with the package is a pytest mock to assist in testing for logged messages. I know that there are some strong opinions about testing log messages, and I don’t suggest doing it extensively, or frequently, but sometimes it’s the easiest way to check a loop, or the log message is tied to an alert, and it is important how it’s formatted. In these cases, you can ask for the annotated_logger_mock
fixture which will intercept, record and forward all log messages.
def test_logs(annotated_logger_mock):
with pytest.raises(KeyError):
complicated_method()
annotated_logger_mock.assert_logged(
"ERROR", # Log level
"That's not the right key", # Log message
present={"success": False, "key": "bad-key"}, # annotations and their values that are required
absent=["fake-annotations"], # annotations that are forbidden
count=1 # Number of times log messages should match
)
The assert_logged
method makes use of pychoir for flexible matching. None of the parameters are required, so feel free to use whichever makes sense. Below is a breakdown of the default and valid values for each parameter.
Parameter | Default Value | Valid Values | Description |
---|---|---|---|
level | Matches anything | String or string-based matcher | Log level to check (e.g., “ERROR”). |
message | Matches anything | String or string-based matcher | Log message to check. |
present | Empty dictionary | Dictionary with string keys and any value | Annotations required in the log. |
absent | Empty set | `ALL`, set, or list of strings | Annotations that must not be present in the log. |
count | All positive integers | Integer or integer-based matcher | Number of times the log message should match. |
The present
key is often what makes the mock truly useful. It allows you to require the things you care about and ignore the things you don’t care about. For example, nobody wants their tests to fail because the run_time
of a method went from 0.0
to 0.1
or fail because the hostname is different on different test machines. But both of those are useful things to have in the logs. This mock should replace everything you use the caplog
fixture for and more.
Other features
Class decorators and persist
Classes can be decorated with @annotate_logs
as well. These classes will have an annotated_logger
attribute added after the init (I was unable to get it to work inside the __init__
). Any decorated methods of that class will have an annotated_logger
that’s based on the class logger. Calls to annotate
that pass persist=True
will set the annotations on the class Annotated Logger and so subsequent calls of any decorated method of that instance will have those annotations. The class instance’s annotated_logger
will also have an annotation of class
specifying which class the logs are coming from.
Iterators
The Annotated Logger also supports logging iterations of an enumerable
object. annotated_logger.iterator
will log the start, each step of the iteration, and when the iteration is complete. This can be useful for pagination in an API if your results object is enumerable, logging each time a page is fetched instead of sitting for a long time with no indication if the pages are hanging or there are simply many pages.
By default the iterator
method will log the value of each iteration, but this can be disabled by setting value=False
. You can also specify the level to log the iterations at if you don’t want the default of info
.
Provided
Because each decorated method gets its own annotated_logger
calls to other methods will not have any annotations from the caller. Instead of simply passing the annotated_logger
object to the method being called, you can specify provided=True
in the decorator invocation. This does two things: first, it means that this method won’t have an annotated_logger
created and passed automatically, instead it requires that the first argument be an existing annotated_logger
, which it will use as a basis for the annotated_logger
object it creates for the function. Second, it adds the annotation of subaction
and sets the decorated function’s name as its value, the action
annotation is preserved as from the method that called and provided the annotated_logger
. Annotations are not persisted from a method decorated with provided=True
to the method that called it, unless the class of the calling method was decorated and the called action annotated with persist=True
, in which case the annotation is set on the annotated_logger
of the instance and shared with all methods as is normal for decorated classes.
The most common use of this is with private methods, especially ones created during a refactor to extract some self contained logic. But other uses are for common methods that are called from a number of different places.
Split messages
Long messages wreak havoc on log parsing tools. I’ve encountered cases where the HTML of a 500 error page was too long for Splunk to parse, causing the entire log entry to be discarded and its annotations to go unprocessed. Setting max_length
when configuring the Annotated Logger will break long messages into multiple log messages each annotated with split=True
, split_complete=False
, message_parts=#
and message_part=#
. The last part of the long message will have split_complete=True
when it is logged.
Only messages can be split like this; annotations will not trigger the splitting. However, a plugin could truncate any values with a length over a certain size.
Pre/Post hooks
You can register hooks that are executed before and after the decorated method is called. The pre_call
and post_call
parameters of the decorator take a reference to a function and will call that function right before passing in the same arguments that the function will be/was called with. This allows the hooks to add annotations and/or log anything that is desired (assuming the decorated function requested an annotated_logger
).
Examples of this would be having a set of annotations that annotate fields on a model and a pre_call
that sets them in a standard way. Or a post_call
that logs if the function left a model in an unsaved state.
Runtime annotations
Most annotations are static, but sometimes you need something that’s dynamic. These are achieved via the RuntimeAnnotationsPlugin
in the Annotated Logger config. The RuntimeAnnotationsPlugin
takes a dict of names and references to functions. These functions will be called and passed the log record when the plugin’s filter method is invoked just before the log message is emitted. Whatever is returned by the function will be set as the value of the annotation of the log message currently being logged.
A common use case is to annotate a request/correlation id, which identifies all of the log messages that were part of a given API request. For Django, one way to do this is via django-guid.
Tips, tricks and gotchas
- When using the decorator in more than one file, it’s useful to do all of the configuration in a file like
log.py
. That allows you tofrom project.log import annotate_logs
everywhere you want to use it and you know it’s all configured and everything will be using the same setup. - Namespacing your loggers helps when there are two projects that both use the Annotated Logger (a package and a service that uses the package). If you are setting anything via
dictConfig
you will want to have a single config that has everything for all Annotated Loggers. - In addition to setting a correlation id for the API request being processed, passing the correlation id of the caller and then annotating that will allow you to trace from the logs of service A to the specific logs in Service B that relate to a call made by service A.
- Plugins are very flexible. For example:
- Send every
exception
log message to a service like Sentry. - Suppress logs from another package, like Django, that you don’t want to see (assuming you’ve configured Django’s logs to use a filter for your Annotated Logger).
- Add annotations for extra information about specific types of exceptions (see the
RequestsPlugin
). - Set run time annotations on a subset of messages (instead of all messages with
RuntimeAnnotationsPlugin
)
- Send every
Questions, feedback and requests
We’d love to hear any questions, comments or requests you might have in an issue. Pull requests welcome as well!
Tags:
Written by
Related posts
Supporting the next generation of developers
Here’s your opportunity to empower the teen in your life to get a start in open source development.
How we built the GitHub Skyline CLI extension using GitHub
GitHub uses GitHub to build GitHub, and our CLI extensions are no exception. Read on to find out how we built the GitHub Skyline CLI extension using GitHub!