Why should I care about logging?

Developing reliable software is hard. Users expect sophisticated features that are easy to use, reliable, and consistent between applications. The underlying software infrastructure is a complex web of components distributed across multiple computers supporting multiple users, applications, and platform configurations. The infrastructure must meet user performance expectations and be resilient to network and system failures.

Modern software is constructed by leveraging existing components provided by multiple vendors and developers. A failure in any one component can result in an observed failure in a distant, seemingly unrelated part of the system. Reported problems can be difficult or sometimes impossible to reproduce. System behavior can be sensitive to subtle differences in software configuration, timing between concurrent threads, user permissions, and numerous other factors.

A great deal of time can be spent tracking down the root cause of a reported problem. An effective logging strategy can dramatically reduce this cost by providing enough information that, using Loupe, your support staff can quickly understand and resolve issues. Loupe is only as good as the data it can obtain, and the very best data comes not from generic observation of the infrastructure but from proper design and execution of a logging strategy.

Common Pitfalls in Logging

At Gibraltar Software, we've been involved in a number of logging - related projects over the past decade. Usually these projects are supposed to be focused on the logging infrastructure: how to safely gather, collect, and disseminate logging data in a production environment. Once the infrastructure has been created/selected/integrated, we universally discover that the biggest barrier to useful logging isn't having good logging infrastructure but instead it's getting the right log messages designed into the software .

There are two classic mistakes we've seen that are the main barrier to developers creating good logs:

Writing From the Perspective of the Software: Most developers unintentionally write messages from the perspective of another developer that's reading through the source code at the same time. This is natural because it's exactly what the developer is doing when they create the message. However, the whole point of logging is to help people that aren't staring at the source code, and ideally aren't even developers (They're support staff).
Leaving off the "and therefore...": In the log message developers will usually say what happened (e.g. this exception happened or the file couldn't be found, etc.) but don't lay out the consequences. Does the exception mean the app needs to be restarted? Or is it something it recovers from on its own?

From this, we've evolved a set of rules to help developers lay out logging data.

Audiences for Logging

First, recognize that there are different logs for different purposes:

End user logging: The primary log for the IT staff and others that use or support your application. This is the traditional log and if you do nothing else, create this log.
Developer's Log: This is a trace/debug log data that is designed just for the development team to use, not for end users to understand.
Performance Log: Store performance metrics and other information to enable health monitoring of the application beyond just failures and events. A traditional web server log or windows performance counters fit in this category.

Second, you need to write log messages differently for the end user log vs. the developer log. The End user log should be treated like a real deliverable, with some log messages part of the requirements of the application itself. You'll want to have consistency over time here as well - support staff will learn what the healthy and unhealthy patterns to this log are, and that knowledge allows them to respond to problems efficiently. Because this log is running all the time and may not be stored in a secure location, secured information should never be written to this log. For example, never write out a raw database connection string (it might contain a SQL User name and password) or an end-user password in the event of a bad password. I know, that information may be useful, but it'll be a red flag for support staff that isn't worth it.

The End User and Developer's logs are often the same log in the logging system but can be separated out - by severity or category. The performance log is distinct because its recording entirely different data (typically numeric, analyzed statistically instead of read linearly). If your logging infrastructure doesn't support separating out distinct logs from each other, see if it supports separation by category or some other means. If it doesn't, there are likely better options.

Few applications develop a performance log, but you should consider it. The lowly basic performance log that's built into web servers is the basis of many analytics systems that can tell you a wealth about the performance and profile of your users and what they're doing with the application.

End User Logging

This is the most important log to put your time into because it's the log that will affect both customer satisfaction with your application and your support costs. If the folks that support your application can figure out what they need to do or head off a problem from your logs, you won't get a support inquiry, you'll get an upset customer. What's it going to take? First, make sure that every message you write satisfies the three rules of end user messages:

What Happened? What event did the application encounter? Try to avoid developer language like Exception and think about what the exception indicated. The description should be based on something that an application user could observe or verify and be consistent with the end-user language of the application.
What does it Mean? So your application lost its connection to the database, but will it recover, or do you need to reboot the computer? Most developers leave out the consequences part of the event, which is very frustrating if you're on the other side of the log.
What is the Most Common Resolution? If the event indicates a problem, what is the most common resolution, if there is one? Alternately, if it's a severe problem there is no known resolution for, you might recommend reporting it back to the development team.

It's traditional and a best practice to have the end-user log record messages with a severity (informational, warning, or error). When picking the severity level, do it based on:

Informational: This should be your default. No action is required, the event is a normal occurrence.
Warning: Something has happened that isn't normal, and may lead to a real problem or indicate a problem, but it may also be OK. For example, say the data directory doesn't exist and it should - but you can also just recreate it and keep on going.
Error: The event is definitely a problem that should be looked into or may even be terminal. These messages are the ones you should spend your most time thinking through to make sure they satisfy the three rules above.

You can optionally have one lower level category, such as "verbose" or "debug" or even "Success" (which fits best with the notion of the other three being a status indicator). This may be a cleaned up version of your developer - level log information.

What events to log?

There is a certain artistry and experimentation to decide how much to log and when. Users will rightly complain if you log too much so they can't separate out the wheat from the chaff, however not logging enough can leave you out in the cold when something happens. Some key points to consider are:

Startup and Shutdown: Startup is one of the most delicate activities for any application because you're trying to create all of the initial state you rely on to operate correctly. If there's an environmental problem, this is when things will go wrong. Shutdown logging is more about indicating if it was able to close cleanly and successfully.
Stateful Logging: If your application can enter an error state - such as losing contact with the database, instead of logging every occurrence, log the first and then record that the application is in an error state. If the situation corrects itself, log that. This is dramatically preferable to logging each occurrence, particularly if that could result in a large number of items being logged. IT people hate it when an application spams a log.

Developer Logging

In the Developer-oriented log messages, you can do whatever you please. Typically this is much more of a code-level trace of execution. Often this log is "leveled" - the support team can specify at runtime what logging level to use depending on how much detail they need. Because this is an optional log, it's permissible (but still questionable) to output raw values that may contain secure data. You may want to have the developer's log only available when running in a debugger or compiled in a special mode. This prevents it from leaking into a public environment where it can cause performance or security problems.

An easy way to handle developer vs. end-user logging is to use the Debug class built into .NET. While this class offers very few options, it has the advantage of automatically being excluded in release builds.

Logging

Developer's Guide - Logging - Best Practices

Send Feedback