3 Scalability Challenges Most Architects Miss
Database hot spots, cache invalidation, and distributed transactions — the three scalability problems that catch teams off-guard in production.

"We'll scale when we need to" is the most common scalability strategy, and it's fine for most applications. Most never need to scale beyond a single well-tuned database and a few application servers. But when scaling does matter, the problems that bite are rarely the ones the architecture diagrams predict. The throughput issue you expected to hit turns out to be fine. The one you didn't expect takes down production on Black Friday.
Three specific problems catch teams out often enough that they deserve attention. None of them are new. All of them are still killing production systems in 2025.
1. Database Hot Spots
The seductive promise of a distributed database is that you can just add nodes. Add a node, get more throughput. In practice, a single hot partition can make a 100-node database perform worse than a single-node deployment, because one node is melting while the rest are idle.
The Pattern
You shard your data by customer_id. You have a million customers. In theory, load is spread evenly. In practice, 5 percent of your customers generate 80 percent of the traffic — and those customers end up on 2 or 3 specific nodes. Those nodes saturate. The rest of the cluster is idle. Your app sees latency spikes and timeouts.
This happens with DynamoDB (partition throttling), Cassandra (hot partitions), Cosmos DB (hot logical partitions), Redis Cluster (hot keys), and any sharded SQL setup where the sharding key is not uniformly distributed.
Symptoms
- Latency spikes that track individual customers or accounts
- Monitoring shows some nodes at 95 percent CPU while others are at 10 percent
- Throttling errors from your database under load that's well below provisioned throughput
- Retry storms that make the problem worse
What to Do
- Pick a partition key that spreads load evenly. Customer ID is usually wrong for multi-tenant systems with power-law usage. A composite key (customer_id + a random bucket) spreads hot customers across more partitions.
- Implement write-through caching for read-heavy hot keys so that only a fraction of reads hit the database.
- Monitor per-partition metrics, not just cluster averages. Cluster average CPU hides hot spot problems.
- Rate-limit per-customer at the application layer to prevent a single tenant from overwhelming shared resources.
The fundamental lesson: uniform distribution is an assumption, not a guarantee. Verify it at load with real data.
2. Cache Invalidation
Phil Karlton's quote — "There are only two hard things in computer science: cache invalidation and naming things" — is overused because it's true. Cache invalidation bugs are responsible for a disproportionate share of hard-to-debug production incidents.
The Patterns That Bite
Stale data after updates. User updates their profile. The read cache serves the old version for the next 5 minutes. User is confused and opens a ticket. Worse: user updates their payment method, charge fails because the cache returned the old method, user churns.
Thundering herd on expiration. A hot cache entry expires. A thousand concurrent requests all miss the cache simultaneously and all try to populate it from the database. The database gets hammered, latency spikes, requests time out, the cache fills but half the clients have already given up. This is the classic thundering herd.
Cache stampede on warm-up. You deploy. The cache is empty. Every request is a miss. The database is overwhelmed. You roll back the deploy, but now the cache is still empty on the old version and the database is still overwhelmed. Recovery takes much longer than the deploy.
Inconsistent cache across nodes. You have a Redis cluster. You invalidate a key. One node drops it. The other nodes still return it because the invalidation message was delayed. Users see different data depending on which node their request hit.
What to Do
- Write-through cache invalidation — when the application writes to the database, it writes to the cache in the same operation (or invalidates the cache entry).
- Short TTLs with background refresh for hot data. Background refresh means a separate job re-populates the cache before it expires, so clients never see a miss on hot keys.
- Cache stampede protection via lock-based population (one client populates, others wait) or stale-while-revalidate (serve stale for a few seconds while the refresh happens).
- Cache warming on deploy — populate the cache with known-hot keys before accepting traffic. Don't rely on the cache filling from user traffic.
- Consistent hashing for distributed caches so that adding or removing a node only affects a small fraction of keys.
3. Distributed Transactions
You need to charge a customer, decrement inventory, and send a confirmation email. In a single database, this is a transaction. In a microservices architecture, charge is in one service, inventory is in another, email is in a third. You can't wrap them in a database transaction.
Why Naive Solutions Fail
- "Just call each service sequentially and roll back on failure" — The rollback steps can fail too. Now you have a charge without an inventory decrement, and you can't reverse the charge because the refund service is down.
- "Just use two-phase commit" — 2PC exists but has serious availability problems. Every participant has to be available during the transaction, and the coordinator is a single point of failure. Modern distributed systems almost universally avoid it.
- "Just hope failures are rare" — They aren't. At scale, rare is several times per hour.
What Actually Works
- Saga pattern. A workflow of steps, each of which has a compensating action. If step 3 fails, run the compensating actions for steps 1 and 2. Compensations are business logic, not technical rollback — "refund the charge" not "undo the write."
- Outbox pattern. The service writes to its database and to an "outbox" table in the same transaction. A separate process reads the outbox and publishes events. Guarantees at-least-once delivery without distributed transactions.
- Idempotency everywhere. Every operation needs to be safe to retry. Use idempotency keys, natural keys, or deduplication.
- Eventual consistency with clear SLAs. Your UI acknowledges that the operation is in progress, shows status, and handles failures explicitly. "Your order has been placed and is being processed" is honest and avoids pretending you have synchronous transactions.
This is harder to design than it looks. Budget accordingly.
The Meta-Lesson
The common thread: scalability is not really about scaling up. It's about what happens when one of your assumptions breaks at scale. Your assumption of uniform load breaks when a few users dominate traffic. Your assumption of cache consistency breaks when the cache and database disagree. Your assumption of atomic operations breaks when operations span services.
Designing for scale means designing for what happens when assumptions break. That's mostly about isolation, idempotency, and graceful degradation, not throughput benchmarks.
What We'd Actually Do
For a team expecting meaningful scale:
- Load test with realistic distributions. Not uniform. Use production traces if you have them.
- Per-partition monitoring for every sharded system.
- Write-through cache invalidation as the default.
- Saga pattern for any multi-service workflow.
- Idempotency on every write endpoint. Natural keys or idempotency tokens.
- Chaos engineering to confirm the degradation paths work when things fail.
Three Takeaways
- Hot partitions are the most common scaling failure mode. Monitor them specifically, not just averages.
- Cache invalidation bugs are production incidents waiting to happen. Design for staleness and stampedes up front.
- Distributed transactions don't work. Design the system so you don't need them.
Talk with us about your infrastructure
Schedule a consultation with a solutions architect.
Schedule a Consultation