Alerting & Release Health

How to Set Up Error Alerts That Matter

A practical guide to configuring new-issue and frequency-based error alerts, choosing thresholds, and routing notifications to the right people.

An error just crashed production. You'll find out in three weeks when a customer complains, or you'll know instantly because an alert woke you up. Setting up error alerts is the difference between shipping with visibility and shipping blind. But not all alerts are created equal — a poorly tuned alert either goes ignored (alert fatigue) or misses the errors that actually matter.

This guide walks you through configuring alerts that page you for real regressions and stay quiet otherwise: new-issue detection, frequency-based thresholds, and routing notifications to the right people at the right time.

Why alerts matter more than a dashboard

Your error tracker is only useful if you know when something is wrong. Checking the dashboard every five minutes isn't sustainable, especially for distributed teams across time zones. A well-tuned alert tells you the moment a new issue appears or an existing bug starts spiking — before your users notice. This is foundational to error alerting best practices across any team.

The key word is "well-tuned." Alerts that trigger on everything become noise. Teams that ignore 90% of their pages quickly stop responding to any of them. The goal is to find the middle ground: catch what matters, ignore what doesn't, and avoid alert fatigue.

Alert rule types: new-issue vs. frequency-based

LightTrace supports two types of alert rules, and you'll likely need both.

New-issue alerts fire the moment an error fingerprint appears for the first time. A fresh bug in production should always reach someone, because if it's new, you haven't seen it yet. These are low-volume by nature — one alert per new issue — so they rarely become noise.

Frequency-based alerts trigger when an existing issue crosses a threshold you define: "alert me if this error hits 50 events in one hour" or "page me if the daily crash rate exceeds 100 users." These catch regressions where a known bug suddenly spikes. Frequency thresholds are where tuning matters most, because the same threshold can either catch critical failures or drown you in false positives depending on your traffic pattern.

Start with new-issue alerts on everything, then add frequency-based alerts only for the errors you've seen spike before. This gives you baseline coverage without immediate noise, then layers protection for your known pain points.

Configuring new-issue alerts

Most teams keep new-issue alerts simple: set one to notify the primary on-call engineer whenever a fresh error appears. The rule typically looks like:

  • When: a new issue is created
  • Notify: the #incidents Slack channel (or your on-call team)
  • Environment filter (optional): production only (ignore staging errors)

LightTrace delivers alerts by email, so the notification goes directly to your team's inbox or integrates into your email-based on-call system. If you want to escalate — page someone if a new issue has high severity — layer that on top: alert first, then escalate to your incident-response playbook if the issue keeps happening.

Avoid the temptation to silence low-severity errors. If an error is low-value enough to ignore, you probably don't need to track it at all. Low-severity filters in your error tracker's configuration (like ignoring certain exceptions or environments) are cleaner than muting alerts downstream.

Frequency-based alerts: choosing smart thresholds

Frequency alerts are where most teams stumble. Too low a threshold and you page on every blip; too high and you miss actual failures.

The best threshold depends on your traffic and your risk tolerance. A high-traffic service might set an alert at 1,000 events/hour because that's still within normal variance; a smaller service might alert at 100. Here's a practical approach:

  1. Know your baseline. Spend a week looking at what normal looks like. What's the daily error count? Which errors appear every single day? This becomes your reference point.
  2. Set a threshold 2–3× above normal. If you see 30 events/day of NullPointerException, set an alert at 100 in a 24-hour window. Spikes of 2–3× indicate a real regression, not random variance.
  3. Start with longer windows, then tighten. A threshold over 24 hours is more forgiving than one over 1 hour. Once alerts stabilize, you can shrink the window for faster detection if you want.
  4. Use time-of-day filters if traffic varies wildly. Some systems get 10× more traffic during business hours. Separate alerts for peak and off-peak windows avoid both false alarms and missed signals.

A concrete example: your service normally logs 500 uncaught exceptions per day. Set frequency alerts at:

  • New issue: always notify
  • Existing issue spike: 1,500 events/day or 150 events/hour (peak hours only)

This catches a genuine regression (3× normal) without paging you every time a particular error shifts from 20 to 40 occurrences.

Avoid "alert on any increase," like "notify if this error doubled." That triggers constantly as traffic naturally fluctuates. Absolute thresholds (100 events/hour) are more stable than relative ones (2× increase).

Routing alerts to the right people

Not every error should notify everyone. Route alerts by:

Ownership. If you use tags to mark which team owns an error, route alerts to that team's email or on-call channel.

Severity. A crash in your payment flow is an all-hands page; a typo in a rarely-used admin view notifies the backend team only.

Environment. Staging errors should have a separate, much lower alert threshold (or none at all), because they're expected and don't need ops attention.

Most teams start with a single catch-all alert (page the primary on-call engineer for all new issues), then add overrides as patterns emerge. For help prioritizing which errors matter most, see how to debug production errors. An error happening in prod on every deploy? Route it to the team that owns that code. An intermittent timeout in a third-party API? Route it to the platform team with a higher frequency threshold because it's usually transient.

Measuring alert quality

After a week of alerts running, ask yourself:

  • Are you acting on each page? If you're closing 90% of alerts without investigation, your thresholds are too sensitive.
  • Are you missing regressions? If a spike goes undetected and a customer reports it first, your threshold is too high.
  • Is context clear? Can a tired on-call engineer understand what broke in 10 seconds, or do they need to dig through the dashboard? Good alerts include a link, the affected user count, and the first affected release — LightTrace provides all of this.

The goal is to be surprised that something broke, never surprised that you got the alert. If most alerts make you think "oh, we already saw that yesterday," your thresholds need tuning. Getting this right directly reduces MTTR by cutting the time between failure and notification.

Alert rules as a reliability tool

Well-tuned alerts are your reliability team's early-warning system. They catch regressions before users do, they notify the right people at the right time, and they stay quiet the rest of the time. Pair alerts with a solid error triage process and on-call rotation, and you've built a feedback loop that keeps regressions short and recoveries fast.

Start simple — new-issue alerts on everything, frequency alerts on your known pain points — then refine as you learn your app's error patterns. The first version won't be perfect; the goal is to go from "I found out from a customer" to "I woke up to an alert," and that's enough.

Start tracking errors in minutes

Point the Sentry SDK at LightTrace and start configuring email alerts today — set up new-issue and frequency-based rules in minutes, then sleep better knowing your team stays on top of production errors.

Fix your next production error faster

Point any Sentry SDK at LightTrace — free up to 5,000 events/month.