Though it’s often painted as a technology that will steal XR’s thunder, AI is the best thing that’s happened to XR in a while. AI can be an intelligence backbone for XR, while XR can be the front-end graphical interface for AI. The two technologies complete each other in lots of ways.
That goes for user-facing XR experiences, such as generating world-immersive overlays and lenses on the fly through text prompts. And it applies to creator-facing work, such as streamlining workflows through the time-saving and capability-boosting power of generative AI.
Both angles were recently advanced by Snap. In Lens Studio 5.0, it launched GenAI suite. This gives lens creators generative AI tools to do things like rapidly prototype 3D designs using text prompts. It also teased the other angle noted above: user-facing generative lenses.
The convergence of these technologies was also explored in a recent report by our research arm, ARtillery Intelligence. As such, it joins our weekly report-excerpt series, with the latest below. This week’s segment zeroes in on the high-level aspects of AI & AR’s collision.
Good Conversationalist
In the last installment of this series, we examined the concept of “generative XR.” Combining XR creation with generative AI, the thought is that developer workflows can be streamlined and empowered. This includes everything from prototyping to creating 3D elements.
Now we move on to another convergence point between XR and AI: inputs and UI. Voice will continue to be a developing input format for XR devices given that they don’t have keyboards. This gets into AI-driven “assistant” functions, such as Snap’s My AI in the newest Spectacles.
Backing up, one of AR’s promises is to provide ambient intelligence that assists users and makes them smarter throughout their day. This includes things like AI-driven visual search to identify and contextualize physical world objects – everything from street signs to storefronts.
Similarly, voice input will involve conversational AI such as ChatGPT, in addition to hand tracking and gestural commands a la Apple Vision Pro. Spoken inputs are inherently more conversational than tapped/typed ones, so they require natural language processing (NLP).
Magic & Utility
In fact, spoken inputs are more intuitive if they’re natural-language enabled. And that’s what conversational AI is good at… it’s right there in its title. So when looking at the intersection of AR and conversational AI, the latter could be the intelligence layer that the former needs.
One example of this has already been demonstrated by Ray Ban Meta Smartglasses: real-time language translation. If your smart glasses can translate language as you converse with someone who speaks a different language, that’s the AI-driven real magic and utility of AR.
Other use cases include assistant functions (“Where have I met this person before?”), or commerce (“Where do I buy that jacket?”). These features are gaining traction and momentum as they’re aligned with the new crop of multimodal-AI driven “lite AR” smart glasses.
These are also compelling use cases because they bring AR from a toy to a tool. Gaming and face filters will continue to be a part of the AR mix, but the technology will gain value – and sustained traction – when it’s more of a utility. And AI will be the engine that propels it.