Skip to main content
Cloud

Cloud for Startups: How to Pick the Bets That Don't Trap You

Startups get seduced by cloud credits and crushed by cloud bills — here's the playbook for launching cheap without locking yourself into decisions you will regret.

John Lane 2022-12-27 6 min read
Cloud for Startups: How to Pick the Bets That Don't Trap You

Startups have a particular relationship with cloud infrastructure. For the first year they are drowning in free credits and everything is wonderful. Somewhere between month 14 and month 20, the credits run out, revenue has not caught up, and somebody finally looks at the AWS bill. The resulting panic has generated more of our emergency consulting engagements than any other single pattern.

Here are the seven bets we would make if we were starting a company tomorrow, focused on keeping optionality high and burn rate low.

1. Use the Credits, Don't Build Around Them

AWS Activate, Azure for Startups, Google for Startups — every hyperscaler hands out $5,000 to $100,000 in credits to early-stage companies. Take them. Use them. But do not architect your product around proprietary services whose only redeeming quality is that they are free during the credit period.

The trap is building on DynamoDB, Cosmos DB, or Firestore because it is free and convenient, then discovering that your data model has deeply coupled to its quirks by the time you start paying for it. PostgreSQL on RDS or Cloud SQL costs a few dollars a month at startup scale and ports cleanly to literally any other environment. Start there unless you have a specific reason not to.

2. Pick One Cloud and Go Deep

Multi-cloud at the startup stage is a distraction. Running a redundant, portable deployment across AWS and GCP doubles your operational surface area in exchange for a theoretical benefit you will not use. Pick one cloud, learn it well, and use its managed services aggressively. You can split later if you have to — the productivity cost of fighting abstraction layers early is much higher than the cost of a future migration.

The exception is when your founding engineers already know one cloud cold. Go where the team is fluent. A team that has shipped on AWS for five years will move three times faster on AWS than on GCP even if GCP is technically a better fit for the workload.

3. Managed Services Until They Hurt

At five engineers, your goal is to ship product, not to operate Postgres. Managed RDS, managed Redis, managed Kubernetes, managed object storage, managed auth. Yes, they cost more per hour than running it yourself on a VM. The comparison that matters is "engineer hours freed up" versus "dollars spent," and at early stage an engineer hour is worth $150 to $300 all-in. A $200 per month managed Postgres that saves two engineer hours a month has already paid for itself.

Things we would manage ourselves from day one:

  • Background job workers. Managed queue services (SQS, Pub/Sub, Service Bus) are fine but the workers themselves are usually a container on a VM. Running them yourself is trivial.
  • Static site hosting. S3 plus CloudFront, or a CDN provider. This is not a managed-service decision, it is a cache configuration problem.
  • Analytics data warehouse. Hold off entirely until you have something to analyze. Snowflake and BigQuery look cheap until they aren't.

4. Small Right-Sized Instances, Not T-Shirt Sizes

The single most common waste pattern we see in startup bills is oversized instances. An engineer provisions an m5.xlarge because the tutorial said to, and it sits at 4 percent CPU for nine months. At $140 a month versus $17 a month for a t3.small that would handle the workload fine, that's $1,500 a year per instance on overprovisioning.

Start small. Watch the metrics. Size up when something actually saturates. The hyperscalers make this easy — it is a 60-second reboot to move from a t3.small to a t3.medium. There is no penalty for being conservative.

Burstable is your friend

t3, t4g, and B-series instances are cheap because they trade sustained performance for bursts. For web tiers, APIs, admin panels, and most startup workloads, burstable is exactly right. The moment the CPU credits start running out consistently you can upgrade — and the metrics make that signal loud and clear.

5. Reserve Nothing in Year One

Reserved Instances and Savings Plans are compelling when you know your workload. In year one you don't. We have watched founders lock in three-year reservations for workloads that were rearchitected six months later, leaving them paying for capacity they no longer used. Stay on-demand until your usage pattern stabilizes — usually around month 15 to month 18. Then reserve the steady-state baseline, not the peak.

Spot instances are different. If you are running batch jobs, ML training, or anything interruptible, use spot immediately. The 60 to 90 percent discount is real and the tradeoff is acceptable for any workload with a retry.

6. Monitor Cost Like You Monitor Errors

Set up cost alerts on day one. AWS Budgets, Azure Cost Management, GCP Billing alerts — pick the one matching your cloud and set a weekly alarm at 120 percent of your expected spend. When it fires, somebody has to log in and figure out why within 24 hours. Every startup we have helped through a cost crisis ignored the alerts for six months before the bill became unignorable.

Tag everything. Even a minimal tag scheme (environment, service, owner) lets you break down the bill by what matters. The tagging effort in week one pays off every week after that.

7. Exit Strategy Baked In

Finally, think about portability. Not because you are going to move, but because the option to move gives you leverage. Containerize everything you can. Keep your data in formats you can export (Postgres dumps, not proprietary NoSQL). Use Terraform or Pulumi for infrastructure, not clickops. Avoid services that have no analog elsewhere — Step Functions, Durable Functions, Cloud Workflows, and similar are convenient but you are paying for them in lock-in.

If you later decide to move 80 percent of your workload to a colocated private cloud to cut 60 percent of your bill — and this is a path we have walked with several growing companies — the prep work above is what makes it doable in weeks instead of quarters.

The Pattern We Recommend

For a pre-Series A startup with 3 to 10 engineers, here is the stack we would bless:

  • Compute: managed Kubernetes (EKS, AKS, or GKE), 2 to 4 small worker nodes, horizontal pod autoscaling. Or if Kubernetes is overkill, container-on-VM with a managed load balancer.
  • Database: managed Postgres, db.t3.small or equivalent, single AZ for dev, multi-AZ for prod.
  • Object storage: S3 or Blob, lifecycle policies from day one to move old data to cheaper tiers.
  • Auth: managed identity provider (Cognito, Entra ID, or Auth0). Do not build this yourself.
  • CDN and edge: Cloudflare, always. The free tier is generous and the enterprise version is still cheaper than hyperscaler equivalents.
  • Observability: one hosted tool for logs and metrics (Grafana Cloud, Datadog, or similar). A self-hosted Prometheus-Grafana stack is fine too if you have an ops-fluent engineer.

Total monthly cost for that stack at pre-Series A traffic levels, after credits burn off: $800 to $2,500. That is a number you can defend to a board.

The main skill in startup infrastructure is saying no to things you do not need yet. The cheapest service is the one you didn't deploy.

Talk with us about your infrastructure

Schedule a consultation with a solutions architect.

Schedule a Consultation
Talk to an expert →