Skip to main content
Cloud

Cloud Workload Balancing: Four Takeaways from Production Migrations

Load balancing is the easy part. Deciding which workload runs where — and when to move it — is the hard part. Four lessons from the field.

John Lane 2024-02-28 5 min read
Cloud Workload Balancing: Four Takeaways from Production Migrations

"Cloud workload balancing" is one of those phrases that means two different things depending on who says it. To a networking engineer it means load balancers: round-robin, least-connections, weighted routing, maybe a sticky session. To an operations leader it means something much bigger: deciding which workloads live on which infrastructure, at which times, at which cost, and moving them when the situation changes. The first is a well-solved problem. The second is the interesting one, and it's where we spend most of our time with customers.

Here are four lessons from the production side of workload balancing. None of them involve tuning a load balancer.

1. Not every workload belongs in the cloud at all

The first real workload-balancing decision is whether a given application belongs in a hyperscaler, a private cloud, or a physical box in a rack. The marketing message is that everything should be in public cloud. The honest answer is that it depends on the workload's characteristics.

Workloads that benefit from public cloud: highly elastic, bursty traffic patterns; new development where time-to-first-deploy matters; applications that need managed services (databases, queues, ML APIs) the team can't operate themselves; globally distributed applications where regional presence matters.

Workloads that don't: steady-state compute at 80 to 90 percent utilization 24/7; latency-sensitive traffic between tightly coupled services (chatty microservices pay cross-AZ tolls); GPU-heavy workloads when GPUs are cheaper at a specialty provider; regulated workloads that have to run on specific physical infrastructure; storage-heavy archival where egress fees would eat the savings.

We routinely help customers who have migrated everything to Azure or AWS move 30 to 50 percent of their workloads back to private cloud or dedicated hosting. The total cost of ownership usually improves by 40 percent or more for the workloads that got moved back, with no loss of user experience. The ones that stay in public cloud are the ones that benefit from public cloud characteristics.

This isn't cloud repatriation as a religion. It's arithmetic.

2. Autoscaling is only useful if the workload can actually scale

The second lesson is about autoscaling, which is one of the most oversold features in cloud computing. Autoscaling works beautifully when three conditions are met: the workload is horizontally scalable (adding more instances actually helps), the application starts up quickly (new instances are useful within a minute), and the load is predictable enough that the scaling signals are timely.

When those three conditions are met, autoscaling saves real money. A web tier that scales from 4 instances at night to 40 during peak is a classic win. A batch job that spins up 200 instances for an hour and then vanishes is another classic win.

When they aren't met, autoscaling is either theater or a liability. A Java application with a 3-minute JVM warmup does not benefit from autoscaling because by the time the new instance is serving real traffic the spike is over. A stateful database cannot be autoscaled at all; you're just turning on read replicas. A workload with unpredictable bursts will trigger autoscaling reactively, which means users see slow responses during the ramp-up window.

The move is to know which of your workloads can actually autoscale, configure those to do so, and leave the rest on fixed capacity. Mixing the two in the same environment is fine. Pretending everything is elastic when it isn't is how you end up with a cloud bill that has the worst of both worlds: reserved capacity you don't need and on-demand bills from things that should be reserved.

3. Cross-region is a placement problem, not a balancing problem

Teams hear "workload balancing" and assume they should spread their traffic across multiple regions for resilience. Sometimes this is the right call. Often it isn't. Multi-region architectures are expensive, complicated, and introduce whole new failure modes — inconsistent data, cross-region replication lag, DNS failover timing, increased egress costs.

The question is not "should I use more regions" but "what is the cost of this workload being unavailable for an hour, and how often does that need to happen per year?" For most business applications, the honest answer is "a few hours of downtime a year is acceptable" and a single region with proper HA within it is the right architecture. For mission-critical systems where downtime costs real money, multi-region is worth the complexity.

The trap is adopting multi-region as a checkbox, paying 2x the infrastructure cost, and then discovering during the first actual outage that the failover doesn't work because nobody tested it. We have watched this happen multiple times. A single-region architecture that works beats a multi-region one that doesn't.

If you do go multi-region, the ratios matter. The secondary region should be kept warm enough to actually accept traffic without an hour of warmup, and the failover process should be rehearsed at least quarterly. Anything less is expensive theater.

4. Workload balancing is an ongoing rebalancing problem, not a one-time design

The last takeaway is the one that catches most architects by surprise. A workload placement decision made in year one will probably not be the right decision in year two, and almost certainly not in year three. Usage patterns change, cloud pricing changes, new instance families come out, managed services you needed in year one get cheaper self-hosted, and workloads you thought would be seasonal turn out to be steady-state (or vice versa).

Treating workload placement as a one-time architectural decision is how you end up paying 2022 prices for 2024 capacity. The teams that get this right run a quarterly review: for each major workload, look at its current footprint, its actual usage, the current cost, and what the alternatives look like today. Sometimes the answer is "nothing to do." Sometimes it's "move this from x86 to ARM instances and save 30 percent." Sometimes it's "this steady-state workload should be repatriated to a dedicated host."

None of these are heroic changes. They're small, regular adjustments that keep the portfolio matched to reality. Without the review, entropy wins — workloads accumulate, costs drift up, and nobody notices until the annual budget meeting.

The portfolio view

The thing that makes all four of these takeaways work is treating your infrastructure as a portfolio of workloads with different characteristics, not as a single bucket labeled "cloud." Each workload has a natural home, and the home can change over time. Your job as an operations leader is to keep matching workloads to homes, not to defend a platform choice you made three years ago.

Customers who adopt this mindset end up with environments that cost less, run more reliably, and scale more gracefully than customers who commit to a single platform and force everything into it. The work is more ongoing, but the results are worth it.

Three Takeaways

  1. Not every workload belongs in public cloud. Run the arithmetic per workload, not as a religion.
  2. Autoscaling helps only when the workload can actually scale. Don't pretend otherwise.
  3. Workload placement is a portfolio you rebalance, not a design you finalize.

Talk with us about your infrastructure

Schedule a consultation with a solutions architect.

Schedule a Consultation
Talk to an expert →