Slow APIs don't fail loudly — they fail silently. A 50ms endpoint becomes 500ms, your frontend hangs, your users bounce, and you don't know why until someone complains. The trick to reduce API latency isn't guessing; it's measuring every millisecond end-to-end, finding where the time actually goes, and fixing the real bottleneck. This guide walks you through the exact playbook.
Latency is invisible without the right tools. You can see error rates spike in dashboards, but latency hides inside averages. A single slow endpoint that hits 10% of requests can drag your median response time down by half while making it look normal on a chart. To find and fix the real slowdowns, you need a systematic approach: measure the whole path, isolate the culprit, and validate the fix.
Measure every service in the request path
The first step to reducing latency is understanding distributed tracing. When a user clicks a button in your frontend, the request doesn't just hit one service — it flows through your API, probably a database layer, maybe a cache, possibly a third-party service. Traditional APM shows you the API's total time, but not where it's spent.
Distributed tracing breaks every request into spans — small timers on each operation. A span for the HTTP request, a span for the database query, a span for the auth check. When you trace the whole path, you see where the time accumulates.
// Node.js / Express with Sentry SDK
const Sentry = require("@sentry/node");
Sentry.init({
dsn: "https://<key>@your-lighttrace-host/1",
tracesSampleRate: 1.0,
environment: "production",
});
app.get("/api/user/:id", async (req, res) => {
const transaction = Sentry.startTransaction({
name: "GET /api/user",
op: "http.server",
});
try {
const user = await db.query("SELECT * FROM users WHERE id = ?", [req.params.id]);
const settings = await cache.get(`user:${req.params.id}:settings`);
res.json({ user, settings });
} finally {
transaction.finish();
}
});
This captures the whole flow. LightTrace groups these traces automatically, so you can see latency percentiles across all requests — not just averages.
Look for the tail, not the mean
Most teams watch response time as an average. Your API averages 100ms, so you think it's fast. But P95 and P99 matter because that's when real users notice. If 1% of requests take 5 seconds, your P99 is 5000ms — and that 1% is where the frustration lives.
Look at latency percentiles, not mean time:
- P50 (median): The middle ground. Not actionable.
- P95: The slowest 5% of requests. This is what a real user in a bad moment experiences.
- P99: The slowest 1%. Usually the canary for a bigger problem coming.
When you see P99 climbing but P50 stays flat, something is spiking rarely but repeatedly. That's the pattern that kills retention.
Read a span waterfall to find the bottleneck
Once you're collecting traces, the next skill is reading them. Span waterfalls show every operation in the request as a horizontal bar, stacked in time order. A waterfall tells you immediately whether the slow part is a database call, a cache miss, or waiting for another service.
Look for:
- Tall bars — operations that take a long time.
- Wide gaps — idle time waiting for something else to finish.
- Stacked bars — parallel operations (good for throughput, bad for latency if serialized).
A common pattern: you query a user, then query their posts, then query the author of each post. Each is fast alone, but if they run one after another, you hit the N+1 query problem.
Catch the N+1 query trap
The N+1 query problem is latency's silent killer. You loop over N rows and query the database for each one — turning one request into N+1 queries. With ten posts, it's harmless. With a thousand, it's a timeout.
Span waterfalls show it plainly: dozens of identical DB query spans in a loop, each 10ms, totaling 300ms. The fix is eager loading — fetch all the related data in one query instead of N+1:
// Bad: N+1 queries (one per post)
const posts = await db.query("SELECT * FROM posts WHERE user_id = ?", [userId]);
for (const post of posts) {
post.author = await db.query("SELECT * FROM users WHERE id = ?", [post.author_id]);
}
// Good: one query with a join
const posts = await db.query(`
SELECT posts.*, users.* FROM posts
JOIN users ON posts.author_id = users.id
WHERE posts.user_id = ?
`, [userId]);
Traces make this visible instantly. See it, fix it, watch latency drop.
Check for slow database queries
If your database spans are taking most of the time, the problem isn't your application — it's your queries. Find slow database queries by looking for spans tagged with long durations.
Common culprits:
- Missing indexes — a query that scans millions of rows when a single index would hit the right ones fast.
- Too many columns — fetching all 50 columns when you only need 3.
- Correlated subqueries — a subquery inside a join that runs repeatedly.
Most database-heavy endpoints have one query doing 80% of the work. Traces pinpoint it; query optimization fixes it.
Monitor transactions, not just errors
Unlike error tracking, which only fires when something breaks, performance monitoring is about watching the slow path before it becomes a crash.
Set up alert rules on latency thresholds, not just error rates. Watch for:
- P95 crossing 500ms when it usually sits at 100ms.
- A specific endpoint's median time drifting up over days.
- A spike at a specific time of day (suggesting a batch job or cache expiry pattern).
These are the patterns that precede customer complaints by hours.
Trace across projects for the full picture
If your architecture spans multiple services — frontend talking to API talking to a worker talking to a payment processor — a single-service trace isn't enough. Cross-project tracing stitches all the spans together into one waterfall, so you see the full latency picture across every service that touches a request.
Without it, each service looks fast locally, but the user experiences slow end-to-end. With it, you see exactly where the handoff is slow.
Tag every trace with the user ID and release version. When you see latency spike after a deploy, you can drill into traces from that release and see what changed.
Apply the fix and validate
Once you've found the bottleneck and applied a fix — whether it's an index, eager loading, or removing an unnecessary service call — the hardest part starts: proving it worked.
Compare latency percentiles from before and after the fix. Look at P50, P95, and P99. If you only fixed the outliers, P95 and P99 will drop but P50 might not budge. If you fixed the common case, everything moves.
New traces show the exact span that was slow is now fast, and throughput (requests per second) either stays flat or goes up. If throughput drops, you may have moved the bottleneck, not eliminated it.
Avoid the trap of fixing latency on one code path only to have traffic shift to a slower one. Use alerts to catch when another endpoint suddenly climbs.
The playbook
- Set up tracing for your services using the Sentry SDK pointed at LightTrace.
- Watch latency percentiles, not means. Set alerts on P95 and P99.
- Read waterfalls to see where the time goes — database, cache, other services, or network.
- Find patterns — N+1 queries, missing indexes, unnecessary calls.
- Fix the top culprit first. One span eating 80% of latency.
- Measure the impact with traces before and after. Watch percentiles move.
- Set up alerts on any regression. Latency creep is silent.
Slow APIs don't fail fast. They fail slowly, frustrating users and tanking retention. But with traces, the path from "something's slow" to "it's fixed" goes from weeks to hours.
Start tracking errors in minutes
Start capturing distributed traces in minutes — point any Sentry SDK at LightTrace, and watch latency bottlenecks become visible and fixable.
Once you have traces, latency stops being a mystery. You see the slow path, fix the real cause, and watch your percentiles improve. The discipline of tracing turns guessing into data.