Cloud Performance Optimization: Past the Obvious Stuff
Once you've right-sized your instances and turned on autoscaling, here's where the real performance wins actually live.

Every cloud performance article starts in the same place: right-size your instances, enable autoscaling, use the CDN. This is fine advice if you haven't done it yet. If you have, you've already captured most of the easy 20 percent and you're wondering why the app is still slow and the bill is still high.
This post is about what comes next. It's the stuff we actually spend our time on when a customer calls us because their cloud migration "didn't deliver the promised performance." In almost every case, the real problems are not at the compute layer. They're in storage, in networking, in the application's assumptions about the environment it's running in, and in the gap between what metrics the customer is watching and what's actually going wrong.
The Obvious Stuff, Very Briefly
Before going anywhere interesting, make sure you've done the basics. Right-size compute to actual utilization (not peak, not nameplate). Turn on autoscaling with sensible floors. Use a CDN for anything static. Use reserved instances or savings plans for steady-state load. Turn on cloud-native monitoring. If any of that is missing, do it first — the rest of this post will not help you.
Done? Good. Here's where the real wins live.
Insight One: Storage Is Almost Always the Bottleneck
Most "my app is slow in the cloud" complaints are storage complaints in disguise. Compute in the cloud is fast, cheap, and plentiful. Storage performance, on the other hand, is a minefield of tiers, throughput limits, burst credits, and IOPS caps that have almost nothing in common with the spinning disks or SAN LUNs your application was designed against.
The gp3 and gp2 trap
AWS's gp2 volumes allocate IOPS based on volume size and have burst credits that deplete under sustained load. We routinely see customers running production databases on 100 GB gp2 volumes that burst fine during testing and then fall off a cliff in production because the burst balance drained. Move to gp3 with explicitly provisioned IOPS and throughput, or io2 for anything latency-sensitive, and the problem usually disappears. This is a line-item change that takes an hour and delivers 5-10x latency improvement for the affected workload.
Azure Premium SSD tiers and IOPS caps
The same pattern exists in Azure. Premium SSD performance is capped per-disk based on disk size, and the caps are non-obvious. A P30 (1 TB) caps at 5000 IOPS and 200 MB/s. If you need more, you provision a larger disk or stripe multiple disks together. Premium SSD v2 decouples size from performance and is the right default for most new workloads, but nobody has migrated yet.
The "ephemeral is fine" assumption
Instance storage (ephemeral NVMe) is dramatically faster than any network-attached option, and almost nobody uses it. The reason is that it goes away when the instance stops. For database buffer pools, temp tablespaces, scratch space, and caches, ephemeral storage is exactly the right choice — if you engineer your application to treat it as cache rather than persistence. This is one of the biggest single-workload wins we've found, and it's free.
Object storage throughput limits are per-prefix
S3 and Azure Blob Storage have per-prefix throughput limits that most developers don't know exist. If your application writes 10,000 objects to the same "folder," you will hit throttling that looks exactly like a slow disk. Randomize prefixes, use a hash in the front of the key, and the problem goes away. This matters enormously for log ingestion, ML training pipelines, and any batch workload that writes a lot of small objects.
Insight Two: Network Topology Is a Performance Feature
The second place real performance gets lost is in the network path between components. The cloud lets you put things anywhere, which means you can put them in the wrong place.
Cross-AZ traffic is not free, fast, or symmetric
AWS and Azure both charge for cross-AZ traffic, and it's also slower than same-AZ traffic. A web server in us-east-1a talking to a database in us-east-1b adds a few milliseconds per round trip compared to same-AZ, and each 1 ms translates into meaningfully lower throughput for chatty applications. For a legacy app doing 500 database calls per page load, this is the difference between a 150 ms page and a 600 ms page.
The fix is not "put everything in one AZ" — that breaks your HA story. The fix is to use read replicas in each AZ, or application-level caching, or a locality-aware load balancer that keeps a user's session pinned to the AZ where their data lives. This is engineering work, not a checkbox.
NAT gateway throughput limits
NAT gateways have throughput limits that will bite you in outbound-heavy workloads (backups, data pipelines, container registries during a mass deploy). They also cost real money per GB processed. For high-volume egress, the right answer is usually a VPC endpoint (PrivateLink) for the destination service, which bypasses the NAT gateway entirely. S3, ECR, DynamoDB, and most AWS services have endpoints. Use them.
Availability zone placement of managed services
This one surprises people. Managed services (RDS, Aurora, ElastiCache, Azure SQL) don't always put the primary where you expect. Check. If your web tier is in AZ-a and the database primary failed over to AZ-c during maintenance last month and nobody noticed, you're eating cross-AZ latency on every query and paying for the privilege.
Insight Three: Your Monitoring Is Probably Measuring the Wrong Thing
The last place performance wins live is in the gap between what your dashboards show and what's actually slow. Most cloud monitoring is CPU, memory, and disk-level metrics. That's the wrong layer to debug application performance.
Percentiles, not averages
The average response time of your API is not interesting. The 95th and 99th percentiles are. If your average is 100 ms and your p99 is 8 seconds, 1 in 100 users has an awful experience and your dashboard says everything is fine. Instrument percentiles everywhere and alert on them.
RUM beats synthetic
Real user monitoring — actual browser timings from actual users — is vastly more valuable than synthetic probes from a datacenter somewhere. Synthetic gives you uptime; RUM gives you truth. The gap between what your synthetic probes say and what your users experience is where the problems you haven't fixed yet live.
Distributed tracing or go home
If your application is anything more than a monolith talking to one database, you need distributed tracing. OpenTelemetry plus any backend (Tempo, Jaeger, X-Ray, Application Insights) will show you the actual call graph, the actual latency budget per hop, and the actual bottleneck. Without it, you're guessing. Most performance mysteries get solved the day somebody sets up tracing — not because tracing fixes anything, but because it shows you what to fix.
Watch the cold path, not the hot path
Cloud instances can be preempted, restarted, or replaced at any time. The warm-path performance of your application is usually fine; the cold-path performance — first request after a scale-out, after a deployment, after a container restart — is where users experience real latency spikes. Measure it explicitly. Fix your app startup time. Pre-warm connection pools. Eagerly load caches at boot.
What We Actually Do for Customers
When a customer calls us with a cloud performance complaint, our first week looks like this: turn on distributed tracing, move any gp2 volumes to gp3 with provisioned IOPS, audit cross-AZ traffic flows, check for NAT gateway throttling, verify the right managed service tiers are in use, and instrument p95/p99 response times on anything user-facing. This is a one-week engagement that has fixed more performance problems than any six-month refactor we've done.
Three Takeaways
- Storage is the bottleneck, almost always. The first place to look for cloud performance problems is the IOPS and throughput caps on whatever volume tier you're using, not the CPU.
- Network topology is application architecture. Cross-AZ chat, NAT gateway throttling, and misplaced managed service primaries cost more latency than most app-level optimizations will save.
- You can't fix what you can't see. Distributed tracing and p99 metrics are the tools that turn performance work from guessing into engineering. Until you have them, you're optimizing in the dark.
Talk with us about your infrastructure
Schedule a consultation with a solutions architect.
Schedule a Consultation