Skip to main content
VDI & Virtualization

Cloud Virtual Desktops: Six Things We Wish We'd Known Sooner

We have deployed over a million VDI sessions in 23 years. Here are the six things that trip up almost every cloud VDI project, and what to do about each one before the first user logs in.

John Lane 2023-06-02 7 min read
Cloud Virtual Desktops: Six Things We Wish We'd Known Sooner

Virtual desktop infrastructure on cloud has been sold as the future of the corporate endpoint for so long that most IT leaders have stopped asking whether it actually works. It does work, reliably and for real workloads, but not in the way the vendor slides promise. We have deployed over a million VDI sessions across Citrix, VMware Horizon, Microsoft AVD, and Windows 365 in the last 23 years, and the pattern of what goes wrong is consistent enough that we can list the six things almost every project gets wrong before they call us.

If you are planning a cloud VDI deployment, these are the ones to get right first. The rest of the project is mostly execution.

One: Profile Size and Logon Time Will Kill Your Pilot

The fastest way to sour a VDI pilot is to have the first user click their session icon and wait 90 seconds for the desktop to appear. It happens constantly, and the root cause is almost always the same: roaming profiles with no size discipline.

A Windows profile that has been growing on a physical PC for five years contains hundreds of megabytes to several gigabytes of cached browser data, Teams message history, OneDrive sync state, and application detritus. On a physical PC, that data lives on the local SSD and nobody notices. On a non-persistent VDI session, it has to be loaded from a profile container (FSLogix or similar) at every logon. Load 4 GB over even a fast network and you have a 30-to-60 second logon.

The fix is to make profile hygiene a hard requirement before migration, not an afterthought. Redirect OneDrive and Teams cache outside the profile container, purge browser caches on logoff, cap profile size at 2 GB with alerts above 1.5 GB, and test logon time on the biggest profiles you can find before you roll out. We have seen this single set of steps drop average logon time from 70 seconds to under 20. Users notice the difference within the first day.

Two: GPU Is Not Optional for Modern Windows

Microsoft Teams, modern Office, Chrome, Edge, and even basic web apps now assume hardware graphics acceleration. Running Windows 11 on a VDI instance with no GPU in 2023 produces a desktop that feels broken even when CPU and memory are adequate. Mouse movement lags, video conferencing is unusable, and users will bypass the system to use their old laptop within a week.

The fix is to use GPU-enabled instance types as the default, not the exception. On Azure AVD that means NVv4 or NVadsA10 series. On AWS WorkSpaces and AppStream the equivalent is PowerPro or Graphics instances. The cost delta is real — GPU instances are roughly 40 to 80 percent more expensive per hour — but it is the difference between a deployment users accept and one they complain about daily. The savings from a non-GPU instance disappear immediately in support tickets and shadow IT.

One nuance: the GPU does not need to be powerful, it just needs to be present. A modest shared GPU (NVIDIA A10 slice, or Intel Flex) is enough for the "modern Windows feels normal" baseline. You only need high-end GPUs for users doing actual graphics work — CAD, video editing, data science notebooks with accelerated libraries.

Three: The Network Matters More Than the Desktop

The VDI experience is bounded by the weakest link between the user's endpoint and the session host. A user with a 200 Mbps home connection but 80 ms of latency to the region will have a worse experience than a user with 25 Mbps and 20 ms of latency. Bandwidth helps; latency is what determines whether the mouse feels connected to the pointer.

Two implications. First, pick the region based on user latency, not on where the data center is cheapest. A 10 ms latency difference is perceptible in VDI even if it is imperceptible in web apps. Second, the VPN path matters. If your VDI traffic is backhauled through a VPN concentrator in a distant data center before reaching the session host, you have added latency to every keystroke. Modern protocols (ICA, PCoIP Ultra, HDX, Blast Extreme, the AVD RDP variant) all handle high-latency links gracefully, but gracefully is not the same as well.

If you can get users onto a direct path — cloud-hosted session over the public internet with a properly configured gateway — do it. Many organizations already did this during the pandemic and never went back; the experience is measurably better.

Four: Storage IOPS Is the Hidden Bottleneck

Session hosts share storage, and the default storage tier on most cloud providers is not fast enough for a multi-user Windows session. The symptom is a system that feels fine at 10 a.m. when people are reading email and unusable at 10:30 a.m. when everyone logs on and Windows Update, antivirus scans, and profile loads all hit the same disk.

The fix is to pay for premium storage on session hosts and user profile containers, not standard. On Azure this means Premium SSD for disks and Premium Files for FSLogix profile shares. On AWS it means gp3 or io2 for EBS and FSx for profile storage. The cost delta is modest compared to the whole VDI bill, and the experience difference is large.

Also, spread session hosts across storage accounts or file shares if the user count is significant. Storage has throughput limits, and every provider enforces them in ways that are not obvious until you hit them at the worst possible moment.

Five: The Image Management Story Has to Be Real

The hardest sustained problem in VDI is not the first deployment, it is the second Tuesday of every month when Microsoft ships patches and you have to update every session host image without breaking anything. Teams that treat image management as a manual process — log into a VM, apply updates, run Sysprep, capture — end up spending a week a month on image hygiene, and the quality drops over time because manual steps are manual.

The pattern that works is image management as code. Azure Image Builder or Packer builds the image from a script, the script installs applications and patches, tests run against the new image automatically, and the image is promoted to the session host pool only if the tests pass. Rolling update policies on the host pool let the new image reach users gradually, and rollback is one command if something breaks.

This is real engineering work, and the vendor slides do not warn you about it. Budget it into the project from day one. If you cannot commit to image-as-code, consider Windows 365 Cloud PCs instead of AVD — Microsoft handles the image management for you, at a higher per-user cost but with less operational burden.

Six: Cost Control Requires Autoscaling That Actually Works

A VDI pool sized for peak usage and running 24/7 costs as much as the worst hour of the day times 24 hours times every day of the year. The economics only work if you scale down aggressively during off hours, and scaling down VDI is harder than it sounds.

The reason is that users do not log off — they disconnect, intending to come back. A disconnected session still holds a host and prevents it from being deprovisioned. If you force-log-off disconnected sessions aggressively, users lose in-progress work and hate the system. If you leave them alone, you never get scale-down savings.

The pattern that works is tiered timeouts: disconnect after 15 minutes of idle, warn the user after an hour disconnected, log off after three to four hours disconnected (or after business hours end). Combined with a scheduled scale-down outside business hours and a scale-up warm pool to handle morning arrivals, you can cut runtime cost by 40 to 60 percent without users noticing.

The built-in autoscalers on AVD and WorkSpaces handle most of this if you configure them, but "if you configure them" is doing a lot of work in that sentence. Default settings are not what you want for cost control. Tune them deliberately and monitor the result.

The Short Version

VDI on cloud works, but the projects that succeed treat it as a full operations discipline, not as "deploy some VMs and call it a day." Get the six things above right before the first user logs in, and the rest of the deployment is mostly execution and training. Get them wrong and you will spend the first six months after launch in damage control mode, losing user trust with every incident.

The thing we wish every IT leader knew before starting: VDI is not a cost-saving move. It is a manageability and security move. Plan the business case around those, not around "desktop replacement savings," and the project stays on the rails.

Talk with us about your infrastructure

Schedule a consultation with a solutions architect.

Schedule a Consultation
Talk to an expert →