Cloud Networking: The Connectivity Choices That Decide the Architecture
Cloud networking is where most hybrid deployments quietly fail — here's how to pick the connectivity model before it picks you.

The network is the part of your cloud architecture that gets chosen last and regretted first. In 23 years of building infrastructure we have watched more customers paint themselves into a corner on networking than on compute, storage, or identity combined. The reason is simple — networking decisions are sticky. A VM can be rebuilt in an afternoon. A VPC peering topology, a BGP AS design, or an Express Route circuit takes months to change and touches everything.
Here are the six networking choices we pay the most attention to when we scope a hybrid or cloud build, in the order they usually matter.
1. Site-to-Cloud: VPN, Direct Connect, or Both
The first decision is how your on-prem network talks to the cloud. IPsec VPN over the public internet is the default because it is cheap and fast to stand up. It is fine for dev, fine for management traffic, and fine for replication that tolerates a latency jitter. It is not fine for production database traffic or VoIP or anything that measures SLA in four nines.
For real production, you want a dedicated circuit — AWS Direct Connect, Azure ExpressRoute, or Google Cloud Interconnect. Expect to pay $500 to $2,500 per month for a 1 Gbps port depending on the carrier and colocation, plus egress. The payoff is predictable latency (sub-5 ms in-region), a real SLA, and private routing that never touches the public internet.
Our recommendation in almost every production hybrid case is both — a dedicated circuit as primary and an IPsec tunnel as warm standby. The IPsec backup costs $30 a month. The day your circuit provider has a fiber cut, you will be glad you had it.
Don't forget the second circuit
If the workload is tier-one, one ExpressRoute is not enough. You need two circuits terminating on two different routers in two different meet-me rooms, ideally two different carriers. Yes, that doubles the cost. The customers who skipped this step all came back to us after their first outage.
2. Hub-and-Spoke Versus Mesh
Inside the cloud itself, the second big decision is your VPC or VNet topology. Hub-and-spoke is the boring answer and almost always the right one. A central hub VNet hosts your firewalls, your VPN gateway, your shared DNS, and your ExpressRoute connection. Spoke VNets host workloads and peer to the hub.
The alternative — full mesh or transit-gateway flat — sounds appealing because it minimizes hops, but it explodes in complexity once you have more than six or seven VPCs. Every new VPC is N new peerings and N new route tables to audit. Hub-and-spoke scales to dozens of spokes with a single route table per spoke.
The modern variant on AWS is Transit Gateway, which is essentially a managed hub. Use it. The $36 per month per attachment is trivial compared to the engineering time you save.
3. Egress Is the Bill Nobody Forecasts
Egress traffic — data leaving the cloud — is where cloud economics get ugly. AWS charges about $0.09 per GB for standard egress in the US; Azure and GCP are similar. Moving 10 TB out costs about $900. Moving it to the internet from within the same region's availability zones can also incur charges. This is the line item that destroys naive lift-and-shift business cases.
Things we do to control egress:
- Keep chatty services co-located. If your web tier is in AWS us-east-1 and your database is in Azure East US, you are paying egress on every query. Don't do that unless you have a very specific reason.
- Use private endpoints for object storage. S3, Blob, and GCS all support private endpoints that bypass the public internet. This avoids NAT gateway charges and egress charges at the same time.
- Buy Direct Connect or ExpressRoute for heavy flows. Egress over a dedicated circuit is roughly a third of the public rate. If you move 20 TB per month off-cloud, the circuit pays for itself.
- Watch replication. Cross-region storage replication, cross-region database replicas, and backup-to-secondary-region all generate egress. Price them before you enable them.
4. Segmentation and Zero Trust
Cloud networking lets you segment at a granularity that was impractical on-prem. A per-subnet Network Security Group that only allows port 443 from the load balancer subnet to the web tier is a one-line policy in ARM or Terraform. A per-workload Zero Trust perimeter is achievable — something that on physical switches would have required a week of VLAN and ACL engineering.
The temptation is to skip this because it feels like overkill. It is not overkill. Lateral movement after an initial compromise is how most breaches escalate from nuisance to catastrophe. Default-deny between tiers, explicit allows for each required flow, and logging on every boundary. If you are not doing flow logs into a SIEM, you are blind to half of what happens inside your cloud.
Service endpoints and private links
AWS PrivateLink, Azure Private Endpoints, and GCP Private Service Connect let you consume managed services (RDS, storage, Key Vault, etc.) without traversing the public internet at all. Use them by default. The cost is a few dollars per endpoint per month and the benefit is enormous.
5. DNS: The Unsung Dependency
DNS is the thing that breaks during every multi-cloud outage and the thing nobody documents. A hybrid environment needs a DNS design that answers these questions cleanly:
- How do on-prem resources resolve cloud resources?
- How do cloud resources resolve on-prem resources?
- What happens when your primary DNS resolver is unreachable?
Our default is a pair of conditional forwarders in the hub VNet that point to on-prem DNS for internal zones, and on-prem DNS conditional-forwarding to the cloud resolver for cloud-private zones. Azure Private DNS Resolver and AWS Route 53 Resolver both support this cleanly. Don't try to get clever with split-brain DNS if you can avoid it — debugging split-brain DNS at 2 a.m. is one of the worst experiences in operations.
6. Observability or You're Flying Blind
Finally, instrument everything. Cloud networks produce more telemetry than most physical networks — flow logs, VPC traffic mirrors, load balancer access logs, DNS query logs, NAT gateway metrics. Collect them. Send them to a centralized store. Build dashboards for packet drops, conntrack exhaustion, NAT port utilization, and top talkers by egress bytes.
We catch most impending outages 24 to 48 hours in advance with a NAT port utilization alert. It's not glamorous, but the NAT gateway quietly running out of ephemeral ports is one of the most common outage causes in AWS deployments, and there is no other signal until the application starts timing out.
What We Actually Build
For a mid-market hybrid customer with 50 to 200 workloads, our default networking stack is ExpressRoute or Direct Connect primary with IPsec backup, hub-and-spoke topology with Transit Gateway or Azure Virtual WAN as the hub, private endpoints for every managed service, per-subnet NSGs with default-deny, centralized DNS via conditional forwarders, and flow logs feeding a Sentinel or Security Hub workspace. It takes about three weeks to design and stand up, and it scales from where the customer starts to where they will be in three years without a rebuild.
The cloud network is the backbone the rest of the architecture bolts onto. Pick the bolts before you pick the rack.
Talk with us about your infrastructure
Schedule a consultation with a solutions architect.
Schedule a Consultation