The state of the art in cloud-based applications changes roughly every 18 months. Lambda was the future, then Fargate was the future, then Kubernetes was the future, then serverless-on-Kubernetes was the future, and now everything old is new again and people are writing monoliths on purpose because the microservices tax got embarrassing. I am not going to tell you which of these fashions to bet on. I am going to tell you which engineering practices have survived all of them.

After working on cloud-based applications continuously since 2010, these are the four that matter.

1. Configuration Is Not Code, and It's Not Secrets Either

A surprising number of otherwise competent teams still put configuration values inside the container image. This is wrong for a reason that has nothing to do with security. It is wrong because the whole point of a cloud-based application is that the same artifact should run in dev, staging, and production without being rebuilt. If the image knows which database it points at, the image is environment-specific, and the deployment pipeline has to rebuild it for each environment, and you lose the ability to promote a known-good build through the pipeline.

The three layers of "config"

Build-time constants. Compiler flags, feature toggles known at build time, version strings. Bake these into the image. They are part of the artifact.

Runtime configuration. Database URLs, queue names, bucket names, connection pool sizes, logging levels. Inject these at runtime through environment variables or a mounted config file. They are part of the environment, not the artifact.

Secrets. Credentials, API keys, certificates. These go in a secrets manager — AWS Secrets Manager, Azure Key Vault, HashiCorp Vault — and are fetched at pod start or injected through a sidecar. They are never in environment variables that show up in process listings, and they are never in config files that end up in source control.

Get this wrong and you will have a very bad time the first time you need to rotate a database password across 40 services. Get it right and the rotation is one command.

2. Statelessness Is a Discipline, Not a Buzzword

"Stateless" is one of those words that gets thrown around until it means nothing. A stateless application is one where you can kill any instance at any time and the next request the user makes succeeds on a different instance with no visible interruption. That is it. That is the whole definition.

What statelessness actually requires

Session state lives outside the process. Redis, DynamoDB, a database — anywhere but in memory. If you are using sticky sessions at the load balancer to work around this, you have not actually made the application stateless, you have just made the problem less visible.
Uploaded files go to object storage immediately. Not the local disk. Not a shared volume. Object storage. The moment an instance writes a file to its local disk and expects that file to be there on the next request, the instance is stateful and the design is broken.
Background jobs are enqueued, not owned. If the process crashes while running a background job, the job should be picked up by another process automatically. This means the job queue is the source of truth, not the process that happens to be running the job.
Startup does not require state from a previous run. A fresh instance with no local cache should be able to serve traffic within a few seconds of starting. If it takes two minutes to warm up a local cache before the instance is useful, you have reintroduced state through the back door.

A well-designed stateless application scales horizontally, handles instance churn without user impact, survives a full region failure by restarting in another region, and does not care about autoscaling. An application that fails any of those tests has state it is not willing to admit to.

3. Observability Is Not Logging

There are still engineering teams operating cloud applications in 2024 with nothing but log files for visibility. If that is you, I am genuinely sorry, and I want to be clear that what you have is not monitoring — it is archaeology. Modern observability rests on three pillars, and they are complementary, not alternatives.

Metrics

Numeric time series. Request rate, error rate, latency histograms, queue depth, connection pool utilization. Aggregated, cheap to store, fast to query. Metrics tell you whether something is wrong.

Traces

Request-level detail that follows a single user request across every service it touches. Traces tell you where the problem is. If a request takes 3 seconds and the trace shows 2.8 seconds in a call to the recommendation service, you have your answer without grepping logs.

Logs

Unstructured or semi-structured events from the application. Logs tell you what the problem actually is. Use them for forensic detail, not for alerting.

The teams that get this right spend roughly the same money on observability as they do on the compute running the application. That sounds like a lot until you run an incident with partial visibility and realize what the outage cost instead.

4. Idempotency Is Non-Negotiable at the Edges

This is the one that bites new teams hardest. In a distributed system, every network call can fail after the work has been done but before the response arrives. Every message in a queue can be delivered more than once. Every webhook can be retried by a caller who never saw your 200. If the application is not idempotent at its edges, every one of these events produces a bug.

What idempotency looks like in practice

Every write endpoint accepts an idempotency key. The client generates a unique ID for the operation. The server stores the ID and the result. If the client retries with the same ID, the server returns the cached result instead of performing the operation twice. Stripe has been doing this since forever. Copy them.
Message handlers use a deduplication table. When a message arrives, the handler checks whether a message with the same ID has already been processed. If yes, acknowledge and move on. If no, process it, write the ID to the dedup table, and then acknowledge. Do these in the right order or you will create a new class of bug.
Database writes use conditional updates. UPDATE ... WHERE version = N instead of UPDATE ... . Optimistic concurrency is your friend. It is also the only thing standing between you and lost updates.

The Meta-Point

These four practices have nothing to do with any specific cloud provider, any specific language, or any specific architecture style. They applied to ten-year-old applications on EC2 classic, they apply to Lambda functions today, and they will still apply to whatever the platform of the week looks like in 2030. Bet on fundamentals. The platform churn will keep happening. If your application gets these four things right, the next migration is an afternoon of YAML. If it gets them wrong, the next migration is a rewrite.

Cloud-Based Applications: Four Practices That Outlive Migrations

1. Configuration Is Not Code, and It's Not Secrets Either

The three layers of "config"

2. Statelessness Is a Discipline, Not a Buzzword

What statelessness actually requires

3. Observability Is Not Logging

Metrics

Traces

Logs

4. Idempotency Is Non-Negotiable at the Edges

What idempotency looks like in practice

The Meta-Point

Talk with us about your infrastructure

On-Premise Infrastructure

Private Cloud

Public Cloud

AI & Automation

Cloud-Based Applications: Four Practices That Outlive Migrations

1. Configuration Is Not Code, and It's Not Secrets Either

The three layers of "config"

2. Statelessness Is a Discipline, Not a Buzzword

What statelessness actually requires

3. Observability Is Not Logging

Metrics

Traces

Logs

4. Idempotency Is Non-Negotiable at the Edges

What idempotency looks like in practice

The Meta-Point

Talk with us about your infrastructure