In part 1 of this series, we covered AR and smart glasses basics and use cases, along with some of their design challenges. Now we pick up where we left off by going deeper into the challenges, including technical hurdles. Let’s dive in…
Technical Challenges
On top of the physical constraints, we have to deal with the following technical challenges:
- Near Eye Optics
- Form factor
- Compute requirements
- Heat dissipation
- What does it look like when it’s off?
- Eye glow
- Efficient SLAM
- Accurate eye, facial & hand tracking
We won’t explore all of these challenges in this post, but progress is being made everywhere! Let’s take a look at the most significant challenge: near-eye optics.
Near Eye Optics – The Fundamental Challenge
Success of smartglasses in a consumer acceptable form factor begins & ends with near eye optics.
— Christopher Grayson, 2016
The above quote seems to be the underlying truth behind the lack of true consumer AR glasses. Therefore, we will devote significant time and energy to understanding the two key components of near-eye optical systems and the advances required.
The above diagram from Radiant Vision Systems demonstrates why AR optics are much more challenging than VR. On the left-hand side, you can see the simplicity of a VR optical system – the light from a display is fed directly into the eye with lenses for magnification. On the right-hand side, you can see the AR optical system, with a compact display offset from the eye. This requires a complex optical combiner to overlay the digital image onto the real world.
Why are the optics so challenging?
Karl Guttag (A legend in the AR space), frequently comments that AR is many orders of magnitude more difficult than VR. The two major bottlenecks in AR optics that require significant advances are the underlying display technology for image generation, and the optical combiner for merging real-world and digital imagery into a single view.
To start to get a sense of the challenges, let’s break down the requirements for the display:
- Small enough to fit into the glasses’ form factor (an order of magnitude smaller than a VR display)
- Not placed directly in front of the eye, but offset with the light routed to the eye
- Positioned close to the eye, so the light must be heavily modified for the eye to focus on it
- Extremely bright (10,000𝑐𝑑/𝑚210,000cd/m2 (nits) to eye ideally), to compete with sunlight and massive losses during transmission
- High-resolution (to cover a ~100° field of view (FoV) with human eye acuity of 1 arcminute, ideally 6K6K resolution is required for each eye. This equates to roughly 1µm pixel pitch)
- Low power consumption
These stringent requirements are compounded by the fact the optical combiner will inherently degrade the light from both the real world and the display!
Optical Combiner
How do we actually combine the light from the two sources? Below you can see a map from IDTechEx of all optical combiner methods:
As you can see, optical combiners are divided into waveguides and non-waveguides. I’m not going to discuss non-waveguides in this post, as from my research it seems like waveguides are going to be the technology of choice for the first generation of consumer AR glasses.
Waveguides
The function of a waveguide is fundamentally quite simple: to guide the light from the display into the viewer’s eye. Waveguides are usually defined by their underlying mechanism: diffraction or reflection. Whilst diffractive waveguides have been the dominant technology up until now, I believe reflective waveguides are the future of AR glasses due to their superior efficiency and image quality. For more information on waveguides, I highly recommend reading this post by Optofidelity.
Above you can see a diagram of a reflective waveguide from Lumus Vision. Lumus Vision is an extremely interesting company that has pioneered reflective waveguides. The light is generated by the micro-display, and expanded and directed into the eye by the waveguide through a series of transflective mirrors. Lumus has been working on reflective waveguides for many years, but they have proved to be extremely difficult to manufacture at scale. However, Lumus has recently renewed its partnership with the glass manufacturing giant Schott to manufacture their waveguides at a new facility in Malaysia. This is a significant step forward for the technology, and could pave the way for the first generation of consumer AR glasses.
Display Technology
There are a plethora of display technologies available, however only a limited subset are applicable to AR glasses. This is because the losses in a waveguide-based system are astronomical, as demonstrated in the diagram below from Karl Guttag.
In order for a suitable amount of nits to reach the eye, the display must be capable of generating a huge amount of light. For this reason, Liquid Crystal on Silicon (LCOS) and MicroLED are the most promising display technologies for AR glasses. Many industry experts see MicroLED as the future of AR displays, with companies like Apple and Meta investing heavily in the technology. Read more about MicroLED in this excellent article by Karl Guttag.
Compute
AR glasses are a full-stack problem, and the compute requirements are variable and intense. Let’s break down the compute workload into its core components:
- Simultaneous Localization and Mapping (SLAM): Understanding the environment
- Neural Networks: Running a multitude of AI models for different tasks
- Rendering: Meshing the real world and digital world together
- Transmission: Funneling data to and from the glasses
- Other: Everything else a standard smartphone does
Offloading as much compute as possible to auxiliary devices is key to the success of AR glasses, for both power and performance reasons. Most humans carry around a supercomputer in their pocket, and we can push even more intensive workloads to the cloud, much like Meta is already doing for their current generation of smart glasses.
Each task has to be carefully placed somewhere on this unconjoined triangle of success, trading off latency, power usage, and compute requirements. For the neural network use case, using a Router LLM to evaluate query difficulty and selecting the most appropriate device for the workload seems like an interesting avenue.
Transmission
In order to leverage the two external compute sources, you will need to have a high bandwidth communication channel. In the cloud case, this will obviously be done via WiFi or cellular networks. However, for the glasses-to-phone connection, more care must be taken in selecting the communication protocol. Bluetooth 5.0, whilst ubiquitous and low power, has an extremely limited bandwidth of 2Mbps, which is far from sufficient for AR glasses. Ultra Wideband (UWB) could be the answer, and has been shipping in the iPhone since 2019, albeit purely for real-time location not data transfer. With a theoretical transfer rate of 1Gbps, it may be the solution that AR and other high-bandwidth peripherals are looking for. Meta has been filing numerous patents relating to UWB (e.g., 20240235606), so we may see this technology in their glasses soon.
Tomorrow’s AR Glasses
The AR glasses we get in the next ~3 years will not be the ultimate form. In order to rival Manfred’s glasses in Accelerando, we will need better specs. The above schematic is taken from one of Meta’s most recent patents, and demonstrates some interesting innovations such as transducers in both the nose bridge and arms for more immersive spatial audio.
In my opinion, the most salient difference between the first and final generation of AR glasses will be the display technology, particularly the resolution and FOV. As the visual fidelity of the display approaches that of the human eye, the line between the real and the virtual will become blurred, then vanish entirely. This may arrive sooner than you think, with commercial micro displays already hitting 4K4K resolution @ ~3µm pixel pitch.
Assembling a team
If we were to assemble a team to build the ultimate AR glasses, it would look something like this:
- Karl Guttag: AR industry veteran, from whom much of my knowledge is derived
- Shin-Tson Wu: Dr. Wu and his progeny will be the creators of the ultimate AR display
- Oliver Kreylos: Who thoroughly dashed my “lasers are the answer” theory
- Thad Starner: Original Google Glass team
- Bernard Kress: Author of the book on AR optics, currently at Google
Conclusion
From all my research, it seems like Meta is well-positioned in this market. I would not be surprised to see exponential adoption if they can deliver on their 2028 roadmap. With the recent advancements in AI, the value proposition for AR glasses has gone from exciting to essential. With developments from Lumus and Meta’s recent patent filings, it seems like this technology may finally be approaching its “iPhone moment” – provided the near-eye optics problem can be solved.
Once the “iPhone moment” is reached, there will be a plethora of regulatory and social challenges to overcome, as highlighted in this excellent SMBC comic. In spite of these challenges, I’m looking forward to the future of AR glasses, and the startups and innovations that will come with them.
TLDR: Long META stock.
I’m grateful to Mithun Hunsur, Madeline Ephgrave & Benjamin Perkins for their feedback on this post.
Footnotes
Christopher Fleetwood is a Machine Learning Engineer at HuggingFace.