Skip to main content
Remote Desktop

Remote Desktop Under the Hood: Four Takeaways from a Decade of Tuning

A decade of tuning VDI and remote desktop deployments, distilled into four things that matter and a dozen that do not.

John Lane 2024-07-29 6 min read
Remote Desktop Under the Hood: Four Takeaways from a Decade of Tuning

We have been running VDI and remote desktop services for customers since the Windows Server 2003 Terminal Services era. Over 20-plus years and a million-plus session deployments, we have tuned enough environments to have strong opinions about what matters and what is a distraction. Here are four takeaways that contradict most of the vendor marketing.

Takeaway One: Protocol Choice Matters Less Than Network Quality

Every remote desktop vendor has a protocol. Citrix has HDX. VMware has Blast Extreme and PCoIP. Microsoft has RDP and now AVD's extended RDP. Parsec has its own. NICE DCV is AWS's answer. Each vendor will tell you theirs is the fastest, the most efficient, the most adaptive, the best under bad conditions.

They are all fine. On a well-provisioned network with under 20 ms of latency and under 0.5 percent packet loss, a user cannot tell the difference between them in a blind test. We know because we have done the tests.

What actually separates good remote desktop experiences from bad ones is network quality. In our experience the biggest wins in a troubled deployment come from, in rough order of impact:

  • Fixing packet loss. Packet loss over 1 percent will make any protocol feel terrible. Fix the loss and you fix the complaint.
  • Reducing jitter. A link with 20 ms of latency and 2 ms of jitter feels fine. A link with 20 ms of latency and 40 ms of jitter feels awful. QoS marking and priority queuing help.
  • Using a UDP-capable transport. TCP-based remote desktop over a lossy link is a pathological combination. Turn on UDP (EDT for Citrix, Blast Extreme UDP, RDP UDP) and measure.
  • Keeping sessions on a clean routing path. We have seen environments where sessions hairpinned through a VPN concentrator adding 80 ms for no reason. Direct internet or SD-WAN fixed it.

The protocol is the top of the stack. The network is the foundation. If the foundation is bad, no amount of protocol tuning will fix it.

Takeaway Two: Storage Is the Most Common Root Cause of "VDI Is Slow"

When a VDI deployment gets complaints, the default assumption is that the network is at fault. It is usually the storage.

A VDI host running 50 to 100 non-persistent desktops is doing a lot of random I/O. Logon generates a burst of reads as profiles load and policies apply. Application launches generate more reads. Logoff generates writes as profile changes flush. If your storage is spindle-based, or if your flash storage is over-committed, logon storms will destroy your user experience at 8 AM every day and you will blame the network because the network is easier to look at.

We have fixed "VDI is slow" tickets by:

  • Moving from iSCSI-over-shared-NAS to local NVMe on the VDI hosts, cutting logon time from 45 seconds to 7
  • Enabling write-back cache in the hypervisor storage layer
  • Moving non-persistent profile data off shared storage and onto host-local storage
  • Increasing the IOPS tier on cloud-managed storage when the cheap tier started throttling under load

Before blaming the network, measure storage latency at the host during the complaint window. If read latency is over 5 ms or write latency is over 10 ms during peak use, you have found your culprit.

Takeaway Three: Profile Management Is Where Deployments Live or Die

On non-persistent VDI — the most common deployment model for K-12, healthcare, and anywhere you want reset-on-logoff — profile management is the single hardest engineering problem in the stack.

The ideal is: user logs in, their documents, browser bookmarks, app settings, and preferences appear instantly, the session feels identical to what they had yesterday, and on logoff everything gets saved back to a central store. The reality is that profile loading is slow, profile corruption is common, profile size grows without bound, and every application decides for itself where to store user state.

Microsoft FSLogix has made this much better than it used to be. Containers hold the entire user profile including the Outlook OST, Teams cache, and OneDrive sync state, and they mount at logon as a virtual disk. This works well when the underlying storage is fast and the container size is managed. It works badly when either of those is wrong.

Our hard-won lessons on profile management:

  • Keep containers small. 5 GB hard cap, enforced by quota. Bigger containers mean slower logon, more storage cost, and more corruption risk.
  • Redirect Teams cache and OneDrive aggressively. Teams generates enormous cache files. OneDrive sync state can consume gigabytes. Both can be excluded from the profile and rehydrated on demand.
  • Monitor container size over time. Set an alert when containers grow unexpectedly. An application that starts writing logs into the profile can eat a terabyte before anyone notices.
  • Have a working profile reset process. When a user's container gets wedged — and it will — there needs to be a documented, tested, one-click way to reset it without losing real user data.

Customers that get profile management right have happy VDI users. Customers that treat it as an afterthought spend their helpdesk budget on "I can't log in" tickets.

Takeaway Four: GPU Acceleration Is Underused and Overrated in Equal Measure

GPUs in VDI are a polarizing topic. Vendors will sell you vGPU licenses, NVIDIA GRID cards, and marketing material about how GPU acceleration is essential for modern knowledge workers. Sales engineers will quote you doubled hardware costs to add it.

The reality is somewhere in between.

Workloads where vGPU is clearly worth the cost:

  • CAD, medical imaging, 3D modeling. Autodesk, Revit, PACS viewers, SolidWorks. These applications are unusable without GPU acceleration. If you have these users, budget for vGPU and do not negotiate with yourself.
  • Video-heavy workflows. A call center that runs Teams or Zoom with video is much happier with GPU video decode.
  • Data visualization at scale. Tableau, Power BI with large datasets. GPU helps meaningfully.

Workloads where vGPU is a waste:

  • Office-only users. Word, Excel, a browser, email. Modern CPUs handle this fine. The GPU sits idle.
  • Terminal server workloads on Windows Server. The per-user marginal cost of vGPU across 100 users of a RDS host is rarely worth it for general use.
  • Thin-client scenarios where the client cannot decode the encoded stream anyway. Adding a GPU to the host does not help if the client is a $80 thin client without hardware video decode.

The honest test is to measure GPU utilization on a pilot group for two weeks. If the GPU sits at single-digit percent usage, you are paying for capacity you are not consuming. If it is regularly above 40 percent, the investment is paying off.

What We Tell Customers

Start with the network, because if the network is bad nothing else will save you. Then fix storage, because that is the most common hidden bottleneck. Then invest in profile management, because that is what determines whether your users forgive the other compromises. Only then worry about GPU and protocol tuning, because those are the polish on top of a foundation that has to be right first.

A decade of tuning teaches you that the hard problems are almost never the ones the marketing material highlights.

Talk with us about your infrastructure

Schedule a consultation with a solutions architect.

Schedule a Consultation
Talk to an expert →