Skip to main content
Cloud

Cloud Storage Services: Six Methods Compared in the Real World

Block, file, object, archive, hybrid, and the one nobody talks about — how cloud storage methods actually perform under production load.

John Lane 2024-04-01 7 min read
Cloud Storage Services: Six Methods Compared in the Real World

Cloud storage is one of those areas where the vendor marketing and the engineering reality have almost no overlap. Every provider sells "storage" as if it were one thing. It is not one thing. It is at least six things, they have dramatically different performance characteristics, they have dramatically different cost structures, and they fail in dramatically different ways. Picking the wrong one for a workload is one of the most common and most expensive mistakes I see.

Here are the six methods we actually use, with honest notes on where each one earns its keep and where it falls over.

1. Block Storage: When the Workload Expects a Disk

Block storage — EBS on AWS, Managed Disks on Azure, Persistent Disks on GCP — is what you get when you attach a virtual hard drive to a virtual machine. The storage presents as a block device, the operating system puts a filesystem on it, and applications read and write as if it were a local disk.

Where it wins

Databases. Full stop. Any application that expects a POSIX filesystem with local-disk performance characteristics and strong consistency. Block storage is the only method that gives you consistent sub-millisecond latency and the transactional guarantees that relational databases depend on. If you are running PostgreSQL, SQL Server, MySQL, or anything with a write-ahead log, block storage is the right answer.

Where it fails

Sharing. Block storage is almost always attached to a single VM at a time. The exceptions — multi-attach, shared block — have enough caveats that I would not recommend them for most workloads. And block storage is expensive per gigabyte compared to object storage, and it is billed for provisioned capacity rather than used capacity. Allocate 500GB, use 50GB, pay for 500GB.

2. File Storage: The Lift-and-Shift Answer

File storage — EFS on AWS, Azure Files, Filestore on GCP — gives you an NFS or SMB share you can mount on many VMs simultaneously. The semantics are what you expect: POSIX permissions on Linux, NTFS-ish permissions on Windows, multiple clients reading and writing the same files.

Where it wins

Any application that was written to use a shared file system and that you do not want to rewrite. This includes a lot of legacy Windows applications, media pipelines, home directories for Linux users in VDI environments, and applications that share state through a filesystem (which is not the best pattern, but exists in production everywhere). File storage lets you move these workloads to the cloud without touching the code.

Where it fails

Performance scaling and cost. A managed cloud file service charges more per gigabyte than block storage and much more than object storage. Performance under heavy concurrent load is unpredictable — you can hit bandwidth caps on the share without realizing they exist. And latency is measurably higher than block storage because you are going over the network, which matters for workloads that do a lot of small reads and writes.

3. Object Storage: Cheap, Scalable, and Weirder Than You Think

Object storage — S3, Azure Blob, GCS — is the workhorse of the cloud era. It is cheap, effectively unlimited in scale, and durable enough that the provider will sell you 11 nines of durability and almost certainly deliver it. It is also not a filesystem, and treating it like one will hurt you.

Where it wins

Almost everything that doesn't need a filesystem. Static assets, backups, log archives, data lake storage, machine learning training data, content distribution origins, CI/CD artifacts. The pricing is brutal in a good way — pennies per gigabyte per month for the cheapest tier — and the durability story is better than anything you can build yourself.

Where it gets weird

Object storage is eventually consistent in ways that bite you if you do not understand them. List operations can lag writes. Overwrites can briefly return the old version. There is no atomic rename. "Directories" are a lie — they are just a convention about slashes in key names. Applications written to assume filesystem semantics will misbehave on object storage, and the bugs will be subtle and hard to reproduce.

The other trap is request pricing. Object storage charges per operation. A workload that reads a million tiny files per minute will spend more on request charges than on storage, and will be slower than the same workload on file storage. Size your files accordingly. For analytics, aim for objects in the 64MB to 1GB range.

4. Archive Storage: The Trick Is the Retrieval Cost

Archive storage — S3 Glacier Deep Archive, Azure Archive, GCS Archive — is object storage at prices that seem fake. Less than a dollar per terabyte per month. If you have 100 terabytes of compliance data that you are legally required to keep for seven years and almost certainly will never read again, archive storage is the right answer.

The catch nobody tells you

Retrieval from archive tiers is slow and expensive. Slow is measured in hours to days. Expensive is measured in dollars per gigabyte retrieved. If you need to pull your entire archive back for an audit, the retrieval bill can dwarf a year of storage cost. This is fine if you genuinely never need the data — but make sure you genuinely never need the data before you move it to archive, because the "mostly never" case is where the math gets painful.

How we actually use it

For material that has a regulatory retention requirement but no operational value. For backup copies that are the third or fourth copy of something, already also held in cheaper-to-retrieve tiers. For "cold" historical data that is only accessed by legal discovery and a scheduled batch job once a year. Not for anything users will want to read on demand.

5. Hybrid / Cached File Gateways: The Boring Answer That Works

A cached file gateway — Storage Gateway on AWS, StorSimple / Azure File Sync, on-prem caches from NetApp and others — presents a local file share backed by cloud object storage. Frequently-accessed files live on the local cache and serve at local-network speed. Less-frequently-accessed files live in the cloud and are fetched on demand. The cache size is a knob you can turn.

Where it wins

Branch offices, medical imaging archives, engineering design files, video editing workflows — anywhere the user experience needs local-disk performance but the working set is large and mostly cold. A 10TB cache in front of a 500TB cloud back end gives users the illusion of 500TB of local storage at a fraction of the cost.

Where it can bite you

The cache misses are slow. If a user asks for a file that isn't cached, it pulls from the cloud over the WAN, and the user waits. If the working set is genuinely random across the whole archive, the cache hit rate will be bad and the user experience will be bad. Measure the access pattern before you commit.

6. The Method Nobody Talks About: Database-as-Storage

This one is unconventional and I am going to defend it anyway. For some workloads — small files, metadata-heavy, queried by attribute rather than by key — a database is a better storage engine than any of the above. Postgres with bytea columns, or a proper key-value store like DynamoDB or Cosmos DB, gives you transactions, queryability, and strong consistency at the cost of scale and per-GB cost.

Where it's the right answer

User-uploaded avatars on a SaaS product with a few million users. Tenant configuration blobs. Generated PDFs that are keyed by a business identifier and need to be looked up by that identifier along with metadata. Anything where the access pattern is "find the file associated with this record" rather than "stream this file to a CDN."

Where it's not

Large files. Hot static assets. Analytics data. Don't put a 100MB video in a database column. You will regret it.

A Decision Framework That Actually Works

Do not pick storage by vendor. Do not pick storage by buzzword. Pick storage by the access pattern the application actually has:

  • Transactional, low latency, single writer: block storage.
  • Shared filesystem, multiple clients, POSIX expected: file storage.
  • Large files, cheap, sequential or whole-file access: object storage.
  • Legally required to exist but never read: archive storage.
  • Branch office or large working set, needs local feel: hybrid gateway.
  • Small files looked up by business key with metadata: database.

If you can match every workload in your environment to the right category with a one-sentence justification, you will pay less and get better performance than you would by putting everything on whatever your vendor happens to be promoting this quarter. That is the entire secret.

Talk with us about your infrastructure

Schedule a consultation with a solutions architect.

Schedule a Consultation
Talk to an expert →