Cloud Computing Challenges: The Four That Keep Coming Back
After two decades of infrastructure work, the same four cloud challenges show up on every project. Here is what they actually look like and how we deal with them.

I have been helping customers adopt, operate, and sometimes walk back cloud deployments for most of the twenty-three years we have been in business. The shape of the challenges has changed over time, but the underlying four have not. Every engagement has some version of them. Every postmortem touches at least one. The tools change. The patterns do not. Here are the four, what they actually look like in practice, and the approach I keep falling back on.
Challenge one: cost that does not behave
The first challenge everyone runs into is that cloud bills do not behave the way you expect. Not because the providers are dishonest — the line items are mostly accurate — but because the mental model of "I pay for what I use" does not map cleanly to how modern cloud services meter.
You pay for data transfer between availability zones even inside your own VPC. You pay for NAT gateway bandwidth separately from the instances behind it. You pay for API calls on some storage classes and not others. You pay for reserved capacity you are not using because someone forgot to reassign it. You pay for load balancer hours even when the backend pool is empty. You pay for managed database storage that cannot be shrunk without a migration.
The result is bills that have no obvious relationship to your application's behavior, and every month the finance team asks why. The honest answer is that the cloud economic model rewards close attention and punishes benign neglect. A workload that nobody is watching will drift upward in cost more or less forever.
What actually works: tagging enforcement, a real showback or chargeback process, and at least one person whose job is to look at the bill. Not "nobody's job but everyone's responsibility." A named person. Weekly or monthly. This single change produces more cost improvement than any of the third-party FinOps tools I have seen, because the problem is attention, not data.
Challenge two: security that is easy to get wrong and hard to audit
The shared responsibility model is the right framework and a source of endless confusion. The provider secures the infrastructure. You secure what you put on top of it. Most breaches are on the customer side of that line, and most of those breaches are caused by the same handful of mistakes.
Public S3 buckets with sensitive data. IAM roles with wildcard permissions handed out during a long night and never tightened. Security groups with 0.0.0.0/0 on ports that should have never been open. Keys committed to Git repos. Secrets sitting in environment variables in plaintext. I could go on — every breach report reads like a greatest-hits album of the same five mistakes.
The cloud providers have built good tools for preventing each of these. AWS has Config Rules, GuardDuty, Security Hub, IAM Access Analyzer. Azure has Defender for Cloud and Policy. GCP has Security Command Center. The tools exist. Most customers do not turn them on. The ones who do turn them on often do not act on the findings. The ones who act on the findings find that the backlog is longer than the team can work through, and the backlog grows faster than it shrinks.
What actually works: start with guardrails, not audits. Preventative controls — SCPs, Azure Policy deny rules, organizational policies — stop the bad thing from happening in the first place. Detective controls find the bad thing after it has already happened. Detective controls are necessary. They are not sufficient. If you rely only on detection, you are in a race against attackers, and the attackers are more motivated.
Challenge three: skills gaps that the industry has not closed
Cloud skills are still scarce in a way that surprises me. You would think after fifteen years of AWS and ten years of Azure, every infrastructure engineer would be fluent in at least one hyperscaler. They are not. We still meet senior engineers who have never written a Terraform module, never configured an IAM role with trust relationships, never debugged a VPC peering problem. This is not a critique — it is a reflection of how fragmented the discipline has become.
The people who are deeply cloud-fluent are expensive. Not slightly expensive — meaningfully expensive. A senior cloud engineer in a mid-sized US city costs more than most mid-market companies can easily afford. The gap between what you need and what you can hire is the real cause of most "our cloud migration went badly" stories. The migration did not go badly because the cloud is hard. It went badly because the team learning the cloud was also trying to run production at the same time.
What actually works: be honest about your team's skills before you decide on the architecture. If you have a two-person ops team and a legacy .NET estate, an all-in Kubernetes migration is a fantasy. Pick boring technology the team can run. Train deliberately, not accidentally. And consider that "hire a partner for the hard parts" is often cheaper than "hire the full skillset internally," because the partner amortizes the skill across multiple customers. This is how we justify our own existence, yes, but it is also how the math genuinely works out for most mid-market shops.
Challenge four: lock-in that sneaks up on you
Lock-in is the challenge that nobody takes seriously until it is too late. The early versions of cloud adoption were supposed to be portable. Write your app against standard interfaces, keep your data in standard formats, be ready to move. Then the managed services got really good, and the economic pressure to use them became irresistible. DynamoDB is amazing. So is Cosmos DB. So is BigQuery. So is every proprietary service, because that is the whole reason the providers build them.
Five years later, your application is written against DynamoDB's specific consistency model and its specific query patterns. Moving it is not a migration — it is a rewrite. And every time you adopt a new managed service, the escape cost goes up. This is not an accident. The providers are not stupid. They are building moats, and the moats work.
What actually works: be deliberate about which managed services you take hard dependencies on and which ones you abstract behind an interface. The rule I use: if the service saves you real engineering effort and the lock-in is acceptable given your risk tolerance, take the dependency and move on. If it saves you a little effort and the lock-in would be fatal if the provider changed policy or prices, keep the abstraction. Do not try to abstract everything — that creates its own maintenance burden and prevents you from using the features you were paying for.
The other honest piece is that lock-in cuts both ways. Sometimes the right answer is to lean in, commit, and stop pretending you will ever leave. If you are using Azure because you are a Microsoft shop and you are never leaving the Microsoft ecosystem, the portability tax is not worth paying.
What unifies all four
These four challenges have something in common: they are all problems of discipline, not problems of technology. Cost requires discipline in tagging and attention. Security requires discipline in applying guardrails. Skills require discipline in hiring and training honestly. Lock-in requires discipline in architectural decisions.
Every one of them gets worse when the organization treats cloud as a procurement decision rather than an operational practice. The companies that run cloud well do not have better tools than the companies that run it badly. They have better habits. That is the uncomfortable lesson from twenty-three years of watching these projects, and it is the one that vendors will never tell you, because discipline is not something you can buy as a SKU.
Talk with us about your infrastructure
Schedule a consultation with a solutions architect.
Schedule a Consultation