Building AR Apps on Cloud: Latency, Compute, and the Edge Decision
AR applications live and die on latency — here's what actually belongs in the cloud, what belongs on-device, and what belongs at the edge.

Augmented reality applications do not have the luxury of ignoring physics. A user looks through a camera or headset, the system recognizes the scene, overlays virtual content, and updates that overlay 60 or 90 times a second. Any delay in that loop turns the experience from magical to nauseating. "Cloud-powered AR" is a real category, but the architecture is nothing like a typical web application, and the tradeoffs are counterintuitive.
Here is how we think about cloud for AR builds, and which pieces actually belong in the cloud.
1. The Latency Budget Is Brutal
A 90 Hz headset has an 11 ms frame budget. Motion-to-photon latency — the time from a user moving their head to the display updating — needs to stay under about 20 ms or people feel it. Round-trip to a cloud data center is 30 to 80 ms on a good day. You cannot put the render loop in the cloud. Period.
What you can put in the cloud is anything that does not need to be in the per-frame loop: content delivery, model training, multi-user synchronization at slower rates, heavy scene understanding that can tolerate 100 ms of delay, and analytics.
The discipline is identifying which parts of your AR experience tolerate latency and which do not. Get this wrong and you ship a sickness simulator.
2. On-Device: The Fast Loop
Everything in the 11 ms loop stays on-device. Pose tracking, hand tracking, eye tracking, rendering, and the basic scene graph. Modern AR devices — Quest 3, Vision Pro, HoloLens, ARCore/ARKit phones — all have capable enough hardware to run the fast loop locally. You write this code in Unity or Unreal or native, you optimize it, and you never let it depend on the network.
The mistake we see from teams new to AR is trying to offload parts of the fast loop to "save battery" or "make the app smaller." It never works. The device has to do the fast loop even in airplane mode.
3. Edge: The Medium Loop
The medium loop — things that need to happen in under 100 ms but can tolerate more than 20 — is where edge computing becomes interesting. Scene segmentation using a large neural network, object recognition against a catalog of millions of items, voice transcription, natural language understanding. These are too heavy for the device but too latency-sensitive for a round-trip to us-east-1.
The practical pattern is a regional edge deployment. AWS Wavelength, Azure Edge Zones, Google Distributed Cloud Edge, and specialty providers like Fastly Compute@Edge and Cloudflare Workers each offer a flavor of "compute close to the user." For AR with meaningful compute in the medium loop, we usually want a GPU at the edge, which narrows the field significantly. Wavelength and specialty providers with GPU POPs are the current options.
The rule of thumb: if the feature breaks at 150 ms round-trip latency, it belongs at the edge. If it works fine at 300 ms, it belongs in the regular cloud.
4. Cloud: The Slow Loop and the Heavy Lifting
The slow loop is where the hyperscaler strengths shine. Things that happen in seconds or minutes and benefit from big compute:
- Training and retraining models on usage data.
- Content delivery for assets the user downloads before a session.
- Multi-user state synchronization at 10 to 30 Hz for shared experiences — this is a "cloud relay" pattern that works fine over standard internet.
- Rendering of high-quality assets that will be downloaded and cached on-device before use.
- Analytics and telemetry on user behavior, session length, crashes.
- Admin and authoring tools — the content creator building the AR experience is not latency-sensitive.
These are just web application workloads. Build them like you would build any other cloud service. The AR-specific magic happens elsewhere.
5. Content Delivery for AR Assets
AR assets are big. A single high-quality 3D model can be 50 to 200 MB. A complete AR scene might reference hundreds of textures and models. Streaming this to the device intelligently is one of the genuinely hard problems in AR.
What works:
- Level-of-detail pipelines. The on-device experience starts with low-poly versions and progressively loads higher-quality assets as bandwidth permits.
- CDN at every edge. Cloudflare, Fastly, or hyperscaler CDN fronted by aggressive caching. Asset URLs should be immutable — cache for a year and version in the URL path.
- Pre-fetch on session start. The first few seconds of an AR session are often idle. Use them to download the next batch of assets.
- Compression and streaming formats. Draco for geometry, KTX2 for textures, glTF as the container. These are not optional for serious AR delivery.
6. Multi-User AR: The Relay Problem
Shared AR experiences — multiple users seeing the same virtual content anchored to the same physical space — need a coordination service. This is solvable in the cloud with a WebSocket or WebRTC relay running at maybe 20 Hz, well within the latency tolerance for most shared experiences.
The harder problem is shared spatial anchoring. Each user's device needs to agree on where the virtual content lives in the real world. ARCore Cloud Anchors, ARKit Collaborative Sessions, and Azure Spatial Anchors (which is being deprecated, annoyingly) all try to solve this. The math involves uploading sparse point clouds and doing visual matching server-side. This is where the cloud earns its keep in AR.
7. Scene Understanding With Cloud ML
The most compelling cloud AR use cases we see are scene understanding at a scale on-device models can't match. Identify every product in a retail aisle and overlay pricing information. Recognize a piece of industrial equipment and fetch its maintenance history. Translate signs in a foreign language. These tasks need models trained on millions of examples and updated frequently — exactly where cloud infrastructure has the advantage.
The architecture is: device captures a frame (or a video clip), uploads it, cloud does the heavy ML, returns structured results, device renders the AR overlay. Latency budget is usually 200 to 800 ms. That fits comfortably in standard cloud round-trip times, which means you can use regular regional infrastructure and save the edge complexity for parts of the stack that actually need it.
The cost side is real. Inference at scale on cloud GPUs adds up. Plan for request batching, aggressive caching of repeated queries, and a tiered model architecture where a small cheap model handles the common cases and a bigger model only runs when the small one is uncertain.
8. Privacy and Camera Streams
AR apps collect camera data. Camera data is legally sensitive in almost every jurisdiction, and sending raw camera frames to the cloud triggers GDPR, CCPA, COPPA, and a pile of sector-specific regulations.
The defensive patterns that work:
- Process on-device first and send only the extracted features, not the raw video.
- Minimize retention — frames that hit the cloud for processing should be deleted within seconds of the response.
- Regional processing — keep EU user data in EU regions. This is not a recommendation, it is a legal requirement.
- Explicit consent before any frame leaves the device. Not buried in a EULA.
- Audit logs of who accessed what and when, for the inevitable data subject access request.
Treat camera data with the same care as healthcare records. The penalty exposure is similar.
The Architecture We Usually Build
For an AR product targeting consumer mobile or enterprise field service, the typical split:
- On-device: all rendering, tracking, low-latency interaction, offline capability.
- Edge (if needed): low-latency ML and scene understanding, multi-user anchoring relays.
- Regional cloud: heavier ML inference, content delivery, user accounts, authoring tools, analytics.
- Global cloud: model training, data warehouse, admin portals, business logic.
The architecture is not "AR in the cloud." It is "AR supported by the cloud." Get the split right and the cloud is your accelerator. Get it wrong and every network blip becomes motion sickness for the user — which is a very short path to uninstalls.
Talk with us about your infrastructure
Schedule a consultation with a solutions architect.
Schedule a Consultation