When every alert pages you at 3 a.m., none of them feel urgent. Alert fatigue is what happens when a team receives so many notifications that they stop responding to them — the boy-who-cried-wolf problem, but for production incidents. A single genuine outage gets lost in hundreds of false alarms, and by the time someone checks their phone, the real damage is done.
Alert fatigue doesn't happen overnight. It creeps in gradually as alert rules pile up, thresholds drift, and teams tune out. The irony is that teams add more alerts trying to catch regressions, which only makes the problem worse. The solution isn't fewer alerts — it's smarter ones. By tuning thresholds, leveraging error grouping and fingerprinting, and routing intelligently, you can build an alert system your team actually trusts.
Why alert fatigue kills incident response
Every ignored alert trains your brain to ignore the next one. When teams receive 100 alerts a day but only 2 are real issues, the false-positive rate is 98%. Human attention is finite; it gets spent on noise first, substance second. By the time a critical error spikes, your on-call engineer is already numb to notifications.
The business cost is invisible until it's not. A server outage that goes unnoticed for 20 minutes because alerts were backgrounded can cost thousands in lost revenue or customer churn. Worse, when your team stops trusting alerts, incident response slows: people take longer to believe a page is real, they wait for confirmation before acting, and the mean-time-to-resolution (MTTR) climbs.
Alert fatigue is self-reinforcing. The more noise, the more your team tunes out. The more they tune out, the slower they respond to real incidents. Silence it early before it becomes a culture problem.
Common sources of noisy alerts
Most alert noise comes from a few predictable places. New-issue alerts that fire on every developer error during a deploy; error-rate thresholds set too low; alerts for errors so common they're harmless (like the infamous ResizeObserver loop). Alerts that trigger across multiple channels but fire at the same time. Duplicate alerts because three different systems are watching the same metric.
One especially pernicious source: environment-bleed. An alert fires on staging or canary environments, routing the same pager notifications as production. Cascading alerts also multiply noise — a single database outage triggers one alert per service that depends on it, turning one incident into ten concurrent pages. The fix for each is surgical: tighten thresholds, filter by environment, deduplicate.
Threshold-based alerting beats the noisy alternative
The tempting path is to alert on everything new. A fresh error code? Alert. Anything that hasn't been seen before? Page the team. This sounds safe but builds alert fatigue fast. You end up with spurious alerts on every developer typo, every A/B test variant, every one-off flake.
Instead, use thresholds. Alert on new issues in production after 5+ occurrences in an hour, not on the first one. Alert on error rate increase (5 errors per minute when your baseline is 2) rather than absolute rate. Set a minimum-event threshold: ignore errors affecting zero or one user. These small constraints cut 80% of noise while keeping you safe. You're not alerting on presence; you're alerting on regression.
Test your thresholds against a week of real data before deploying them. Does a "high error rate" alert actually fire only during known incidents? If it fires 50 times a week for real production use, it's too sensitive.
Grouping is your force multiplier against alert fatigue
One crash that hits a thousand times looks like a thousand separate incidents unless your error tracker groups them correctly. With good grouping, a million events collapse into one issue, which means one page instead of a thousand. This single technical lever cuts alert noise more than anything else.
LightTrace uses fingerprinting to group identical errors into issues automatically, so a single line of code that breaks generates exactly one alert, not a firehose. If you're seeing multiple alerts for what looks like the same bug, your grouping is probably wrong — either the stack traces look different when they're the same error (bad fingerprinting) or different errors are getting grouped together (over-aggressive grouping). Audit your error tracking best practices and tighten up the grouping logic.
Route alerts to the person who can fix them
A critical database error doesn't need to page the frontend team. A JavaScript parsing error doesn't need the database oncall. Yet many teams have one global alert channel where everything lands. The result: 90% of pages are irrelevant to the person being paged, so they learn to ignore all of them.
If you have the infrastructure for it, route by component. Database alerts to the database team. API errors to the backend team. Client-side errors to the frontend. If that's not possible, tag every alert clearly so context-switching is fast — "[Payment API] Error rate high" tells the oncall exactly who to bring in. For guidance on setting this up correctly, read how to set up error alerts and on-call for developers.
Keep alert rules fresh and aligned with reality
Alert rules decay. A threshold that was right six months ago isn't right today. Your baseline error rate has changed; the service's scale has grown; you've fixed the top five crashes. Old rules sit in the config and fire predictably, adding to the noise.
Schedule a quarterly audit: collect the last three months of alert firing data and ask: Did this alert fire? Was it actionable? Did it catch a real incident, or was it noise? Turn off alerts that haven't fired in 90 days or that fire more than five times a week without leading to an incident. Each disabled alert makes the remaining ones more credible. Your team won't ignore an alert that actually means something.
Building trust takes discipline
A team that trusts its alerts responds fast and gets better MTTR. A team drowning in notifications responds slowly and misses real problems. The path from fatigue to trust is unglamorous: tighter thresholds, smarter routing, periodic cleanup, and the discipline to listen to feedback from your on-call rotation about which alerts are noise.
The payoff is worth it. When your team knows that a page means something broke, they answer their phone. Incidents get diagnosed in minutes instead of hours. And the next time a real issue arrives, it won't get lost in the crowd.
Start tracking errors in minutes
Point your Sentry SDK at LightTrace and set up alert rules that your team will actually trust — free up to 5,000 events a month.
You can also check out crash-free rate as a complementary metric to track your app's stability over time, and revisit what is error tracking to make sure your foundation is solid.