Can AI Unlock AR's True Potential?

Until fairly recently, developments in AR and AI have largely been two separate conversations in the tech community. While AI is already used on the back end to build immersive AR experiences on mobile devices, its role is quickly evolving to help merge the physical and digital worlds when applied to AR glasses.

AI enhances both the back-end development and the user experience; it allows developers to build 3D models faster and more efficiently than before, while enabling real-time communication across language barriers. Thanks to the evolution of both technologies, we’re on the precipice of a significant lifestyle change in which the physical and digital worlds will seamlessly merge with AI-enabled AR.

World-building with Gen AI

Generative AI is first and foremost eliminating previous limitations on 3D model building. Advances in algorithms, language models, and processing power empower developers to make the necessary calculations to map and interact with the physical world. Generative AI helps build 3D models quickly and efficiently, automating a previously slow, manual process. Not only will the creation of a digital world be faster, but it will be more engaging. Generative AI will enable people to transpose their imaginations into the real world. Wearers can simply use voice recognition to ‘speak’ their images and 3D objects through their AR glasses exactly how they want it.

For example, they could say: “put a tree in the middle of my kitchen” – and the image would appear in front of them. The opportunity to create without needing to code is a novel one; the wearer can world-build on their own without the developers’ direct input. This is especially true for gaming applications of AR. Generative AI-enabled AR glasses will transform the gaming world and make it far more immersive. With a function like ChatGPT, it’s possible to create more realistic and engaged characters, or add new worlds and quests. AI will also enhance the experience by assessing player behavior and adjusting the difficulty accordingly.

Generative XR: AI & Spatial Computing Converge

Deepening connections with people, places, and things

AI-enabled AR not only merges the digital and physical, but it enables people to create new and deeper connections in the real world. One such example is AI-empowered Automatic Speech Recognition (ASR), which uses neural network audiovisual speech recognition (an algorithm relying on image processing to extract text) to translate written text into any language in real time.

This can be applied when wearing AR glasses and reading a traffic sign in another country. It can automatically translate what you read into your native tongue. Or, even further, it can provide real-time subtitles while someone is speaking in another language. This is a major development to connect people in foreign places, eliminating tourist frustration and fostering a more connected world. This also allows the deaf community to understand and contribute in conversations without the need to lipread or make eye contact. With AI-enabled AR glasses, they’d simply turn audio into captions so that they are displayed in front of their eyes.

Additionally, AI Optical Character Recognition (OCR) techniques with text-to-text translation engines such as DeepL bridge the gap between text recognition and translation. AI engines such as Stable Diffusion can add animations or images to immediately assist in explaining complicated topics. AI-enabled AR glasses can therefore show a relevant, real-time video, image, or animation that is relevant to what a presenter is saying at a presentation or a co-worker is explaining in a meeting. Google has commented on its development of AR glasses that can do just this.

These functionalities are being adopted in industries like education and healthcare. And it’s just the beginning. In the not-so-distant future, we will have AR glasses that seamlessly transport wearers into virtual or augmented reality environments to streamline virtual communication and meetings.

Flow State: AI and AR Complete Each Other

Object detection helps us understand the real world

When wearing a pair of AR glasses with convolutional neural network (CNN) object detection AI capabilities, you can walk the streets of any city in the world and learn about any landmark in real time upon viewing it. The AR glasses can identify, label, and provide information about the city and its landmarks – all through the wearer’s frames.

That’s because CNN algorithms in object detection are currently being used in mobile devices to estimate the position and extent of objects within a scene. Once it detects an object, AR software can overlay text onto it or generate another object into the physical world and create an interaction between the two.

Objects that are transposed into the real world have many applications including dieting, navigation, and more. For example, soon nutritional data such as calories, protein, fat, and cholesterol of any food and serving size will be available just upon looking at food with AI-enabled AR glasses on – like an enhanced version of QR codes.

And it doesn’t stop at object detection – AI-enabled facial recognition software is evolving for people detection. This is already popular in a variety of commercial industries, such as transportation security at the airport. When employed in AR glasses, AI could give the power of recognition to wearers everywhere, allowing you to see background information on someone instantly on an augmented reality social media platform.

These are just a few of the possibilities with AI-enabled AR glasses and its growth is on a rapid ascent; AI is poised to make more progress in ten years than in the 50 years preceding it. Not only will faster processing power help build these virtual worlds more quickly, but it will enhance them. Not only will AR glasses blend the physical and digital, but they will deepen our connections through streamlined communications and instant knowledge.

David Goldman is VP at Lumus.