Skip to main content
Security

Cloud Security: The Eight Practices That Move the Needle

Cloud security checklists run to hundreds of items. Here are the eight practices that actually change your risk posture, ranked by how much they matter.

John Lane 2023-12-21 7 min read
Cloud Security: The Eight Practices That Move the Needle

Cloud security best practices lists typically run to 150 or 200 items, which is why most teams never finish them. That is the wrong framing. Not every practice is equally important, and spending the same effort on every item means you will get tired before you do the things that actually matter. Here are the eight that change your risk posture in ways that show up in real incident reports and insurance questionnaires, in roughly the order I would execute them.

One: identity is the perimeter, so start there

The single most important thing you can do for cloud security is stop thinking about network perimeters and start thinking about identity. Every major cloud breach I have studied in the past five years had an identity failure at the center. Stolen credentials. Overprivileged roles. Service accounts with keys committed to Git. Admin consoles exposed to the internet without MFA.

Concretely: enforce MFA on every human account, no exceptions. Use conditional access policies to require compliant devices for admin consoles. Kill long-lived access keys where you can and rotate them ruthlessly where you cannot. Use short-lived, role-assumed credentials for CI/CD pipelines. And — this is the one nobody wants to do — review your admin membership every quarter and kick out anyone who does not actively need it.

If you only do one thing on this list, do this one.

Two: guardrails before audits

Detective controls are necessary. Preventive controls are better. If you can use a Service Control Policy, an Azure Policy deny rule, or a GCP organizational policy to make the bad thing impossible in the first place, do that instead of hoping an alert will fire after the bad thing happens.

Concrete examples. Deny creation of public S3 buckets organization-wide. Deny security groups with 0.0.0.0/0 on sensitive ports. Deny resource creation outside approved regions. Deny IAM roles with * actions. Deny unencrypted volumes. Every one of these prevents a whole class of breaches and does it with a single policy document.

The thing nobody says out loud: guardrails annoy your engineers, and they will push back. Do it anyway. The push back is usually about specific edge cases that can be handled with exception processes. The alternative — relying on detective controls alone — means your security posture is a race between alerts and attackers, and the attackers are working nights and weekends.

Three: encrypt everything, but know what that actually means

"Encrypt everything at rest and in transit" is good advice that is easy to misimplement. The important question is who holds the keys. If the cloud provider manages the keys and you use the default encryption settings, you are protected against a physical disk being stolen from a datacenter, which is not a realistic threat. You are not protected against a compromised admin account, because that account can decrypt everything.

If you care about key management — and regulated workloads usually do — you need customer-managed keys in KMS (or Key Vault, or Cloud KMS). You need key rotation policies. You need access controls on the keys themselves. For the highest assurance, you bring your own keys (BYOK) or use a hardware security module (HSM). Most workloads do not need HSMs. Most regulated workloads benefit from customer-managed keys with proper IAM separation.

In transit is simpler. TLS 1.2 minimum, 1.3 preferred, and do not let anyone ship an application with certificate validation disabled "for testing" because it always gets left in.

Four: logging that someone actually reads

Every cloud has good logging. AWS CloudTrail, Azure Activity Log, GCP Cloud Audit Logs. Turn them all on, for all regions, all services, with delivery to an immutable bucket in a separate account. Do not skip the separate-account part, because if an attacker compromises your primary account, you do not want them able to delete their own tracks.

Logging is only useful if someone reads it. SIEMs are one answer, and an expensive one. A cheaper version: write a handful of high-value detection rules against the logs. Root account usage. IAM policy changes. Security group modifications on sensitive networks. Console logins from unexpected countries. Deletion of logging infrastructure. Each of these is a small number of events per week in a healthy environment, and an alert is actionable when it fires.

The trap is trying to detect everything. Detect a small number of high-signal things well, and then graduate to more detection as the program matures.

Five: network segmentation that matches your actual risk model

The classic VPC-with-public-and-private-subnets pattern is fine as a starting point and inadequate as a finishing point. Real segmentation means thinking about which workloads should be able to talk to which other workloads and enforcing that at the network layer, not just hoping your application firewall is configured correctly.

For AWS, that means security groups that reference other security groups, not IP ranges. For Azure, NSGs plus application security groups. For both, private endpoints for managed services so your data plane does not traverse the public internet. For Kubernetes, NetworkPolicies that default-deny and explicitly permit. For multi-account architectures, Transit Gateway or vNet peering with deliberate routing, not flat connectivity.

The goal is that if one workload is compromised, the blast radius is bounded. "Flat network" is a phrase that should make you uncomfortable. If an attacker landing on your marketing website can reach your database, your network is flat, regardless of what the diagram looks like.

Six: backups that actually survive an attack

Ransomware is now the threat model that drives backup design, not hardware failure. Hardware failure rarely matters in the cloud because the provider handles it. Ransomware matters enormously because a sufficiently determined attacker can reach your backups through the same IAM permissions that created them.

The fix is immutable backups in an account the production environment cannot touch. AWS Backup with Vault Lock. Azure Backup with immutability enabled. Veeam with hardened repositories. The details matter, but the principle is simple: the production workload should be able to write to backups and should not be able to delete them. Deletion should require a separate workflow, a separate approval, and a separate account.

Test your restores. Not "verify the backup job finished." Actually restore to a new environment and confirm the data is usable. Quarterly at least. More often if the data is critical. An untested backup is a hope, not a plan.

Seven: patching and image hygiene

Unpatched systems remain a primary attack vector. In cloud environments the discipline looks different than on-prem. Instead of patching running instances, you rebuild them from updated base images. Immutable infrastructure. Golden images. Every new deployment runs on a freshly built base with the latest patches.

Tools matter here. AWS Image Builder, Azure VM Image Builder, Packer for portability. A CI pipeline that rebuilds base images weekly and triggers redeployment of workloads that use them. Container scanning in the build pipeline so vulnerable images never reach production. Dependency scanning on application code because your base OS is not the only place vulnerabilities live.

The trap is treating this as a one-time effort. Image hygiene is a running process, and it degrades every week you ignore it.

Eight: tabletop exercises and an actual incident plan

The last practice is the least technical and the most consistently skipped. You need an incident response plan that exists on paper, that names people with specific roles, and that has been practiced. Not a document that a consultant wrote three years ago and nobody has read since. A living document that the on-call engineer can find at 3am when the GuardDuty alert fires.

Practice it. Tabletop exercises are not a quality theater. They surface real gaps — who has the authority to pull a production system offline, who calls legal, who calls the insurance provider, who drafts customer communications. The first time you answer those questions should not be during a real incident.

What I left off

I left off antivirus, WAFs, and third-party posture management tools. Not because they are worthless — they all have roles — but because they are far down the priority list. If you do the eight above, a WAF becomes incrementally useful. If you do none of them, a WAF is lipstick on a system that is going to get compromised regardless.

Security is not a product you buy. It is a set of habits you build. These eight are the habits that, in our experience, actually move the needle. Do them in order, do them consistently, and you will be in better shape than most of the organizations you read about in breach reports.

Talk with us about your infrastructure

Schedule a consultation with a solutions architect.

Schedule a Consultation
Talk to an expert →