Where Cloud Latency Actually Comes From (And What You Can Fix)
A practical breakdown of the five places cloud latency hides — and which ones you can actually do something about without rewriting your application.

"The cloud is slow" is almost never a true statement. What's usually true is that one specific segment of the path is slow, and nobody has measured it carefully enough to say which one. After 23 years of building infrastructure — colo, private cloud, and every hyperscaler — we've learned that cloud latency is a stack of five distinct problems, and most teams try to fix the wrong one first.
Here is what actually contributes to your round-trip time, ranked by how often it's the real culprit versus how often people blame it.
1. Physical Distance (Everyone Blames This, It's Rarely the Whole Story)
Light in fiber travels at roughly two-thirds the speed of light in vacuum. Sandy, Utah to us-east-1 in Northern Virginia is about 2,000 miles, which is a hard floor of ~20 ms round-trip just for the photons. Add routing and you're realistically at 55-70 ms. To Frankfurt you're looking at 140-160 ms. These are physics numbers — no amount of optimization fixes them.
The mistake teams make: they assume "pick a closer region" is the answer, then pick a region that's closer for them but farther from their database, their users, or their third-party APIs. We've seen a healthcare customer move their app tier to us-west-2 to be closer to their Utah office, only to discover their Epic integration was pinned to a Chicago data center. They added 40 ms to every transaction trying to save 15.
Measure the full path before you move anything. mtr is still the right tool.
2. Intra-Region Network Hops
This is the layer most people don't even know exists. Inside a cloud region, traffic between your VM and a managed service (RDS, Cosmos, Cloud SQL) crosses a non-trivial number of switches, load balancers, and sometimes a "service endpoint" abstraction that adds 1-3 ms per call. For a web request that makes 20 database calls serially, that's 20-60 ms of latency you can't see in any APM tool that only measures the app-to-DB round-trip.
What actually helps: place your app and its dependencies in the same availability zone when the workload tolerates it, use private link or VPC endpoints instead of public ones, and collapse chatty ORM calls into fewer, larger queries. A single SELECT ... JOIN will always beat five separate lookups, and the cloud makes that gap bigger than on-prem.
3. The TLS Handshake You Don't Think About
Every fresh HTTPS connection costs you a TCP handshake (1 RTT) plus a TLS handshake (1-2 RTTs depending on TLS version). On a 60 ms path that's 180 ms before any useful bytes move. Most engineers know this in theory and ignore it in practice because their local dev environment has a 0.5 ms RTT.
The fix is connection reuse. Make sure your HTTP client is actually pooling connections. Make sure your Lambda or Cloud Function isn't spinning up a fresh TLS session per invocation — this is one of the biggest silent latency costs in serverless architectures. Enable TLS 1.3 end-to-end; it shaves a full round trip off the handshake.
We've cut p99 latency in half for customers just by fixing connection pooling in an SDK client library. No infrastructure change.
4. Egress Through NAT Gateways and Firewalls
Cloud NAT gateways, managed firewalls, and security appliances are not free. A typical AWS NAT Gateway adds 0.5-2 ms per flow in steady state, but under load it can spike much higher, and it meters on bandwidth too. Azure Firewall and GCP Cloud NAT have similar behavior. If your architecture has a VM calling an external API through NAT through a transit gateway through a firewall virtual appliance, you've stacked five separate queuing points.
Each one is fine on its own. Together they produce the weird "it's fast most of the time and then suddenly not" pattern that teams spend weeks chasing.
What works: put outbound traffic through the minimum number of hops. Use VPC endpoints / private endpoints for cloud-provider APIs instead of routing through NAT. Use direct-connect or ExpressRoute for your on-prem-to-cloud path instead of VPN-over-internet — the latency difference is 10-30 ms consistently, and the jitter difference is much larger.
5. The Application Itself
This one hurts to admit but it's the most common root cause. "Cloud is slow" turns into "our N+1 query pattern was fine on the old server because the database was on localhost, and now it's 60 ms away and we make 300 of those queries per request." The cloud didn't make the application slower. The cloud made an existing problem visible.
Before you blame the network:
- Turn on your framework's query logging and count round-trips per request
- Look for serialized HTTP calls that could be parallelized
- Check whether your app is using connection pools or opening fresh sockets
- Verify that cache keys actually match (a broken cache is worse than no cache because it still costs you the lookup)
We have fixed more "cloud latency" problems by deleting code than by changing network topology. It's not glamorous work, but it's where the wins are.
What a Real Latency Audit Looks Like
When we do a latency audit for a customer, here's the order we check things in, because it's the order that surfaces the biggest wins fastest:
- Measure from the user, not from the cloud. Real User Monitoring (RUM) or synthetic checks from the customer's actual geography. Anything measured from inside the VPC is a lie.
- Break the path into segments. Client-to-edge, edge-to-app, app-to-DB, app-to-third-party. Get a p50 and p99 for each segment.
- Fix the p99, not the p50. Averages hide the problem. The p99 is where your users feel the site.
- Look for serialized calls first. One slow call times 20 is the pattern we see most often.
- Check connection reuse. Fresh TLS per call is the second most common pattern.
- Then talk about regions and CDNs. By the time you get here you've usually already fixed the problem.
Three Takeaways
- Distance is the floor, not the ceiling. Physics gives you a minimum latency for any region pair. If your actual number is more than 2x that floor, you have a fixable problem somewhere else in the stack.
- The cheapest latency fix is deleting round-trips. Connection pooling, query batching, and parallelization beat any infrastructure change for most workloads.
- Measure the segments, not the total. A single end-to-end number tells you nothing useful. Break the path apart and the culprit is almost always obvious.
Talk with us about your infrastructure
Schedule a consultation with a solutions architect.
Schedule a Consultation