The Three Pillars of Observability: Logs, Metrics, Traces

When something goes wrong in production, you face a fundamental question: where do I look? A user reports slowness. An API endpoint is throwing 500s. A job that usually runs in 2 seconds now takes 30. Without visibility into your system, you're blind. The three pillars of observability — logs, metrics, and traces — each answer a different question about what's happening, and together they form the foundation of production reliability.

Most teams have one or two of them and wonder why troubleshooting still feels like guessing. This guide explains what each pillar does, how they complement each other, and how error tracking ties them together into a coherent story about your system's health.

Logs: the detailed chronicle

A log is a record of a discrete event. "User clicked button." "Database query took 250ms." "Payment API returned 503." Logs are the most granular piece of data you collect — they're exhaustive, time-stamped, and happen in real time.

Logs are excellent for answering what happened, in order? If a user's checkout failed, logs show every step of the transaction: the database insert, the API call, the webhook delivery, the error on line 42 of payment.ts. Logs are how you reconstruct the exact sequence leading to a failure.

The problem with logs alone is volume. A healthy system generates thousands of log lines per second. Searching for the one that matters is like finding a needle in a haystack, especially if you don't know which haystack to search. Logs are the detailed primary source, but they're not the alert.

Metrics: the big picture

A metric is a single number that represents your system's state at a point in time. "Error rate: 2.3%." "API latency p99: 340ms." "CPU usage: 67%." Metrics are aggregated, time-series data — they roll up thousands of individual events into a trend line.

Metrics answer is the system healthy right now? They let you spot trends at a glance. A metric dashboard shows you that error rate spiked from 0.1% to 5% at 2 a.m., or that a deploy correlated with a latency increase. Metrics are the smoke detector.

But metrics are also abstract. Knowing your error rate is up doesn't tell you which error or which line of code broke. "2,000 errors per minute" could mean a new type exception or a known issue getting louder. Metrics point you toward a problem; they don't identify it.

Traces: the request's journey

A trace follows a single request from the user's browser (or a client's API call) through your entire system and back. It records every hop — browser to CDN to load balancer to API service to database to cache — as a sequence of spans. Each span is a unit of work: a database query, an HTTP call, a function execution. Spans nest, so you see the full waterfall of what happened and how long each step took.

Traces answer where did the time go, and where did it break? If a checkout endpoint is slow, a trace waterfall shows you that the database query took 200ms (fine), the payment API call took 3 seconds (the culprit), and the response serialization took 10ms (negligible). You're not looking at an aggregate; you're looking at the exact request that mattered.

The limitation of traces alone is that they're expensive to store and compute, so most teams sample them — capture one in a hundred requests, or one in a thousand. You get a statistical picture of typical requests, but you might miss the one failure that matters. For that, you need logs.

How they fit together

The three pillars work best as a system. Start with a metric alert: "Error rate spiked." Now you have a problem to investigate.

Next, search your logs for the corresponding time window and service. You find 500 errors in the last 10 minutes, all with the same stack trace. That's your fingerprint — one root cause, many occurrences.

Then, pull a trace from that same time window. You're looking at an individual request that hit the same error. The waterfall shows you that execution reached a specific database query, which timed out. Combined with breadcrumbs from the logs (the user ID, the request context, the sequence of operations), you now have a story: this user's request tried to fetch N records with a query that wasn't indexed, the database timed out, and we threw an error without a graceful fallback.

That's the full loop: metrics show you the problem exists, logs and traces tell you what it is, and together they point to the fix.

Where error tracking enters the picture

Error tracking is a specialized tool that glues metrics, logs, and traces together specifically for failures. An error tracker automatically captures exceptions and groups them by fingerprint — so one bug isn't ten thousand log lines, it's one issue. It enriches each error with breadcrumbs (the log-like context preceding the crash), stack traces (the exact line that broke), and traces (the request's journey through your system).

Error tracking is sometimes called "application performance monitoring" (APM) or observability, but it's more specific: it's the intersection of logging, metrics, and tracing, focused on errors and performance anomalies. For a deeper dive, see what is APM?

LightTrace is an error tracker built on the Sentry SDK protocol. You point the SDK at LightTrace (just a DSN change if you're already using Sentry), and errors start flowing in automatically. Each error arrives with its full context:

import * as Sentry from "@sentry/node";

Sentry.init({
  dsn: "https://<key>@your-lighttrace-host/1",
  tracesSampleRate: 1.0,
  environment: "production",
  release: "api@2.1.0",
});

app.get("/checkout", async (req, res) => {
  try {
    const order = await db.query(...);
    res.json(order);
  } catch (error) {
    // This error is captured automatically with full context.
    // You'll see stack trace, breadcrumbs, the request, the user, and a trace if available.
    throw error;
  }
});

That single init captures every unhandled exception. When an error lands, LightTrace groups it by fingerprint, shows you how many users hit it, when it started, and whether it correlates with a recent deploy. Grouped errors with context cut hours of searching down to minutes.

Building your observability practice

In practice, most teams layer them like this:

Error tracking in the center. Set up an error tracker (LightTrace, pointed at via the Sentry SDK) to capture exceptions and surface them immediately. This is where you'll spend most of your time reacting to production problems.
Metrics around it. Set up dashboards for error rates, latency percentiles, and throughput. Metrics tell you when to go looking; error tracking tells you what you're looking for.
Logs for detail. Structured, searchable logs let you dive deeper when a metric alert or an error doesn't give you the full picture. See structured logging best practices for how to keep them useful.
Traces for performance. If you're chasing latency, traces are invaluable. Most error trackers (including LightTrace) integrate with tracing, so you can see the waterfall for any error.

Start with error tracking and metrics. Logs are powerful but generate noise; start selective and grow your log volume as you learn what matters. Traces are expensive; use sampling and start with high-traffic paths.

The common mistake is treating these as separate stacks — a logging tool, a metrics tool, a tracing tool, an error tracker — and then trying to manually correlate data across four systems. The real gain comes from choosing tools that integrate: error tracking that captures traces, dashboards that link to errors, and logs that are queryable from context.

The integration that completes the picture

The three pillars only work as a system when they can talk to each other. When an error lands in your error tracker, you should be able to click through to the trace (to see the request's path), query the logs (to see the context), and reference the metrics (to see if it's isolated or widespread). That's observability — a cohesive view of what your system is doing.

Start tracking errors in minutes

Start capturing errors with the Sentry SDK pointed at LightTrace, and see the three pillars in action — free up to 5,000 events a month.

Start free →Sign in

The three pillars of observability aren't optional pieces you add when things go wrong. They're the foundation of confidence in production. Logs, metrics, and traces together give you the speed you need to move fast and the visibility you need to fix things quickly when they break.

Logs: the detailed chronicle

Metrics: the big picture

Traces: the request's journey

How they fit together

Where error tracking enters the picture

Building your observability practice

The integration that completes the picture

Start tracking errors in minutes

Fix your next production error faster

Related reading

What Is Error Tracking? A Developer's Guide

Error Monitoring vs Logging: What's the Difference?

Crash Reporting Explained: Catching Silent Failures

How to Read a Stack Trace (and Find the Root Cause)