Capacity Planning When Your Data Center Is Already Out of Power
The hardest capacity planning conversations are the ones that start after the floor is already full. Here is how we think about getting another 20 percent out of a stranded facility before writing the check for a new one.

Most capacity planning articles assume you are planning years in advance, with a blank floor and a hiring plan. Real conversations rarely start there. The conversations we actually have with customers sound like this: "We told facilities we had room for another 40 servers. We do not. The PDUs on row C are at 85 percent. The chiller plant trips every time it hits 95 F outside. And marketing just signed a deal that adds another 300 VMs next quarter." That is a different problem than the textbooks describe, and it needs a different playbook.
The Three Constraints That Actually Bite
Every data center has three resources that can run out, and they almost never run out at the same time. In our experience on customer floors:
Power is the constraint that limits you first. Racks get denser every hardware cycle. A rack that held 32 1U servers at 4 kW in 2015 is now holding 16 servers drawing 9 kW and nobody updated the circuit plan. Stranded space above stranded power is the single most common condition we walk into.
Cooling is the constraint that embarrasses you second. You might have enough total tonnage on paper. You do not have enough delivery to the hot spots. Air moves where the physics let it move, not where the spreadsheet says it should.
Physical space is almost never the actual bottleneck anymore. If someone tells you they are out of space, they are usually out of power in the space that matters. Empty racks without available circuits do not count as capacity.
Knowing which constraint is biting first changes the entire conversation. You cannot solve a power problem by ordering more racks. You cannot solve a cooling problem by adding more UPS capacity. Diagnose before you prescribe.
Step 1 — Measure What You Actually Have
The first thing we do when walking onto a stranded floor is measure. Not model. Measure. The spreadsheet the facilities team maintains is almost always wrong, usually in the optimistic direction. Every rack gets a live reading on draw versus circuit rating. Every row gets an inlet temperature walk with a handheld sensor or, better, permanent wireless sensors deployed before the analysis starts.
The numbers matter because designed capacity and operational capacity are not the same. A 60 amp circuit is rated for continuous draw at 80 percent — 48 amps — per the NEC. That is your operational ceiling, not 60. A 10 kW rack design that routinely runs at 9.8 kW is not "at 98 percent utilization" — it is over limit and one transient spike away from tripping the breaker. Headroom is not waste. Headroom is insurance.
The same is true for cooling. The facility might be rated at 400 tons. At 95 F ambient with the economizer disabled and one CRAH down for maintenance, the delivered capacity is closer to 310 tons. Plan against delivered, not rated.
Step 2 — Find the Stranded Capacity
Almost every "full" data center we have walked into has 10 to 25 percent stranded capacity hiding somewhere. Finding it is the cheapest capacity you will ever buy. Here is the checklist we walk:
Dormant and zombie servers. Anywhere between 10 and 30 percent of physical servers on a mature floor are running but serving no useful workload. They are drawing power, generating heat, and occupying a circuit. A light-touch audit — pull process and network stats over 30 days, cross-reference with the owning team — will typically identify 5 to 15 percent of the footprint as decommission candidates. This is the single highest-ROI activity available to a stranded floor and nobody wants to do it because it means having conversations with application owners about systems they forgot they owned.
Power imbalances across phases. A three-phase PDU feeding single-phase loads tends to drift out of balance as equipment is added and removed. A PDU drawing 60 percent on phase A and 30 percent on phases B and C has stranded capacity on B and C that rebalancing will recover. This is a weekend of work for an electrician with the right logs.
Hot aisles without containment. An uncontained hot aisle wastes 20 to 30 percent of your cooling capacity by letting hot return air mix with cold supply. Retrofitting containment on an existing row is not cheap, but it is cheaper than a chiller expansion and much cheaper than building a new floor. If your aisles are not contained, start there before you consider any capacity expansion.
Supply temperature too low. We still walk into facilities maintaining 68 F supply air because that is what the commissioning guide said in 2009. ASHRAE's recommended envelope has been 18 to 27 C at the inlet for a decade. Raising supply temperature to the top of the recommended range recovers chiller efficiency and extends compressor life. It is not free — you need to validate your equipment warranties and monitor inlet temps continuously — but it is cheap compared to new capital.
Unused circuits on the PDU tail. In tightly provisioned floors we regularly find PDUs with one or two unused outlets and a few amps of headroom that never got assigned because the original deployment plan was conservative. Reconcile what is physically there against what the DCIM says is there.
Step 3 — Move Workload Before You Move Hardware
The cheapest capacity in your existing facility is often the workload you move out of it. Not to the cloud necessarily — we are not cloud zealots — but to the place that makes sense for each workload.
Dev and test to burstable cloud or a cheaper tier. Non-production workloads should almost never be consuming premium Tier 3 capacity during business hours only to sit idle overnight. Move them to spot instances, move them to a secondary facility, or power them down on a schedule. The marginal cost of dev and test on your production floor is the opportunity cost of the production workloads you cannot land there.
Backup and DR to cheaper cold storage. If a meaningful chunk of your floor is storage for backups and archives, you are paying Tier 3 rates for data that rarely gets read. Object storage in a hyperscaler or a purpose-built backup target eats a fraction of the power per TB and the RTO is usually acceptable for backup workloads.
Burstable batch to cloud. Month-end reporting, monthly ML retraining, analytics jobs that run twice a quarter — all great candidates to evict from the production floor. Pay for the hours you use, free the capacity the rest of the month.
Each of these moves is a project, and each comes with a real migration cost, but the capacity they free is almost always cheaper than building new.
Step 4 — Decide Honestly About the Next Floor
At some point the recovery exercises are exhausted and you genuinely need more capacity. Here is the decision framework we use with customers facing that wall:
- If you need capacity in 90 days: You are leasing colo space somewhere. Period. Construction lead times for anything meaningful are 12 to 24 months now, and the electrical switchgear alone is running 30 to 40 week lead times as of this writing. Do not pretend otherwise.
- If you need capacity in 12 to 18 months: Lease, with an eye toward negotiating expansion rights. Build is still too slow.
- If you need capacity in 2 to 4 years and the math on owning is better than leasing: Now build is on the table. Only now. And only if you have an operations team that can run it, which is a separate conversation.
- If your business case is under pressure on cost: Hybrid is usually the right answer. Keep the steady-state production workload on owned infrastructure because you are buying at amortized cost of capital. Push the bursty, seasonal, or growth-uncertain workloads to cloud where you only pay when you use them.
Build-versus-lease math has shifted meaningfully in the last two years. Electrical supply chain delays, rising utility connection costs, and the impact of AI workloads on available capacity in major metros have all moved the honest numbers. Whatever model you ran in 2022 needs to be rerun today. Do not let last cycle's answer drive this cycle's decision.
What We'd Actually Do
The playbook we run when a customer calls with "we are out of capacity" looks like this, in order:
- Week 1: Measure. Live power, inlet temps, phase balance. Build the real picture.
- Week 2 to 4: Identify stranded capacity — zombies, phase imbalance, containment, supply temp. Execute the cheap wins.
- Week 4 to 8: Identify evictable workloads — dev, test, backup, batch. Plan the migrations.
- Week 8 onward: If the recovery work did not free enough runway, start the colo or build conversation with honest numbers and honest lead times.
Most of the time steps 1 through 3 recover enough capacity to push the "new facility" conversation out by 12 to 24 months. That is 12 to 24 months of not writing a nine-figure check, which tends to be a welcome outcome.
Three Takeaways
- Power is the constraint that bites first. Space is almost never the real problem. Measure before you model.
- Stranded capacity is the cheapest capacity. Zombies, phase imbalance, containment, and supply temp will typically recover 10 to 25 percent without new capital.
- Honest lead times beat optimistic plans. Switchgear lead times are brutal right now. If you need capacity inside 18 months, you are leasing, not building.
Talk with us about your infrastructure
Schedule a consultation with a solutions architect.
Schedule a Consultation