Ai continues to replace the metaverse as the tech buzzword du jour. This is mostly focused on generative AI such as DALL-E 2, and conversational AI such as Chat GPT. Both are blowing everyone’s minds with their ability to generate meaningful content from text prompts.
Speaking of text prompts, some have even called this the death of the Google-dominant era of search. Though that could be overblown or premature – Google itself is well positioned for AI as we’ve examined – there’s ample promise in the underlying technology from the likes of OpenAI.
Meanwhile, amidst all this excitement, one question we’re often asked is how generative AI will converge with AR. And though it’s still early for generative and conversational AI, we can see some high-level ways that the technology could merge with AR and engender new use cases.
For example, generative AI can assist AR developers in the current bottleneck of creating 3D models – the base ingredient of immersive experiences. Similarly, AR experiences themselves could be built around generating graphics on the fly, rather than pre-ordained sequences.
After examining the above possibilities in Part I of this series, we now return here in Part II with additional outcomes at the intersection of AR and AI. Specifically, we move from generative AI’s role in AR content creation to conversational AI’s potential as an AR intelligence layer.
Backing up, one of AR’s promises is to provide ambient intelligence that assists users and makes them smarter as they move throughout their day. This includes things like visual search to identify and contextualize physical world objects – everything from street signs to storefronts.
This moves away from AR’s fun and games (think: Pokémon Go and selfie lenses) to its utilitarian endpoints. That in turn makes AR more sustainable and sticky. We’ve already seen a glimpse of this through AR tools like Google Lens, but they’re mostly confined to 5-inch screens.
When AR evolves from handheld to headworn – which is gradually underway – these intelligence tools will become more ambient and automatic given that you don’t have to hold up your phone to use them. And that line-of-sight orientation is where conversational AI factors in.
Put the ‘Conversation’ in Conversational AI
Specifically, the “conversational” part makes this technology align with AR glasses. Because there’s no keyboard or touchpad for smart glasses, the input will likely be voice. And spoken inputs are inherently different than typed/tapped ones. In other words, they’re more conversational.
By contrast, smartphones have gotten away with text inputs that are less conversational for two big reasons. One is that Google has trained us to use keywords. The other is that keywords are more efficient than full sentences… which resonates when you’re tapping on a small screen.
But spoken inputs are more intuitive if they’re natural-language enabled. And that’s what conversational AI is good at… it’s right there in its title. So when looking at the intersection of AR and conversational AI, the latter could be the intelligence layer the former needs.
One example of this has already been demonstrated by Google: real-time language translation. If your smart glasses can translate language right in front of your eyes as you converse with someone who speaks another language, that’s the real magic and utility of AR (read: killer app).
Beyond language translation, broadly-applicable use cases could be assistant functions (“where have I met this person before?”), or commerce (“where do I buy that jacket?”). This is where AI could revive AR given that its proposed savior, Apple, itself has an Achilles Heel: Siri.
That means that Apple has from now until a circa-2025 AR glasses launch to build or buy a more functional intelligence layer (note: Apple could enter the market sooner with a mixed-reality headset with passthrough AR). And that’s where conversational AI enters the picture.
In fact, this aligns with the “lite AR” approach we’ve theorized. This is a design hack to sidestep AR’s optical challenges (heat, bulk, etc.) with hardware that’s experientially meaningful, if graphically underpowered. The “experientially meaningful” part is what Apple does best.
That could play out through elegant integrations with other wearables, as well as the “intelligence layer” we keep mentioning. It could be a sort of ambient assistant that makes you a smarter and more effective human. And conversational AI could be the puzzle piece that it needs.