By 2030, each of these technologies; AI, XR & Blockchain, will be fully integrated into the Metaverse and each will create massive value for businesses and consumers alike. Learning about and leveraging these new tools will allow the Metaverse to be created not just by programmers, developers, and 3D artists, but by everyone. (keep reading to make your own!)
“With AI in the Metaverse, everyone will be a creator.”
This article will cover Artificial Intelligence exclusively and its importance to the future of the Metaverse.
- Generative AI (Text, Audio & Image)
- NeRF — 3D spatial capture
- Computer Vision & SLAM
- Natural Language Processing & Conversational AI
- Automatic Content Creation (3D)
AI in the Metaverse holds the power to unleash unlimited creativity while ensuring everyone has equal opportunities. Many will see these technologies as a replacement for human labor. For some roles, this will certainly be true. But more likely we will adapt to doing much more with much less, which will be required as we enter the exponential age of humanity. With Generative AI, the biggest thing to note is that while the neural networks they use to create novel content are trained on open data sets scraped from the internet. The work they create is not derivative but original. Every piece of content they generate, whether audio, text, video, or images is a novel creation based on billions of training data points scraped from the internet.
Before you continue this article I want you to understand two things;
- AI Changes Everything.
- AI is Already Here and it’s not going away.
The Metaverse consists of the collection of media including video, audio, and text that we see in the current iteration of the internet plus three groups of technologies; AI, XR, and Blockchain. If for no other reason than that Ryan Reynolds is already using AI and incredible art like the video above is being made, you should be paying attention.
Generative AI (Text & Image)
Let’s start with the most common and understood; Generative AI Interfaces based on GPT (Generative Pre-Trained Transformer) algorithms, the most well-known being ChatGPT. These generative AI models use massive datasets and scrape the internet for data. Based on simple text input, these AI platforms can create incredibly valuable responses that can be used for:
- Search: AI-powered insights. Google AI, ChatGPT, OpenAI
- Text: Summarizing or automating content. GPT3/4, ChatGPT, Open AI
- Images: Generating images. Midjourney, DALL-E, Stable Diffusion
- Audio: Summarizing, generating or converting text in audio. Play.ht, Clipchamp, Soundraw
- Video: Generating or editing videos. Synthesia, VEED.io
- Code: Generating code: ChatGPT, GitHub Co-Pilot, IntelliCode, PyCharm, Jedi
- Chatbots: Automating customer service and more. Zendesk, Ada, DeepConverse
- Natural Language Processing (NLP): InWorldAI, Synthesia, MindMeld
- Computer Vision: HawkEye, VisoAI, DeepMind, SenseTime
- Simultaneous Location & Mapping (SLAM): Apple, PTC, Snap, Niantic, Meta
- Machine Learning (ML): NVIDIA, Microsoft, iTechArt, Meta
- Suggestion Algorithms: Google, Amazon, Microsoft, Netflix
Let’s do a text one together
STEP 1: Go to chat.openai.com/chat— wait for a free server
STEP 2: Enter Prompt — ‘Write a fun dad joke about AI.’
OUTPUT: Why was the AI feeling cold? Because it left its algorithm open!
STEP 3: Laugh — Either at how dumb the joke is or how amazing how instant the response was, but either way, it truly is amazing.
STEP 4: Try a bunch of work-related tasks you need done asap. (ie. Write an article about…Give 10 examples of….Write a marketing strategy for…)
Let’s make an image using Midjourney
STEP 1: Register at Midjourney
STEP 2: Get on the Midjourney Discord Server
STEP 3: Find a room to submit your query.
STEP 4: Prompt the following — /Imagine Dragon hanging on a castle high resolution photoreal, fire breathing — ar 3:2
NOTE: Imagine is required to start the prompt (not part of the band)
STEP 5: Choose the one you like and click V# you want to see 4 more versions
STEP 6: Choose the best one and click U# to upscale the image
Here is a little more reading for a deeper understanding of Who Owns the Generative AI Platform? from a16z and McKinsey’s “What is Generative AI?” You can also read this PBS Special on How AI Turns Texts into Images.
Without getting too philosophical on this subject, generative AI holds the potential to fundamentally change the fabric of society. Imagine when AI not only defends you in court, but also drafts (and passes) laws. AI is already being used by governments to decide who gets welfare and who doesn’t…and many times, it gets it wrong! Imagine your grandmother being denied medical coverage because an algorithm decided she was not worth saving. What other business models and social constructs will be upended? If everyone uses AI to create content, there will be unintended consequences, but the value this nascent technology will create cannot be overstated.
NeRFs (Neural Radiance Fields)
Let’s move to another subset of AI known as NeRFs, not related to the foam missiles you fire at your younger brother, but Neural Radiance Fields, a complex field of study that uses computer vision from a regular RGB camera to capture video and translate it into volumetric 3D renders you can import into 3D platforms and view spatially. NeRFs are not just a better way to turn scans of real-world places into 3D orders of magnitude faster than current LiDAR solutions at a fraction of the cost by using a smartphone camera vs. $50–100K scanner. AI also takes the information and fills in the blanks to create a realistic and believable virtual version of physical space. These virtual models of the real world will help us populate spaces in the Metaverse quickly and easily, making everyone a creator.
These digital replicas of the real world will help us build shared spaces in the metaverse quickly and easily, extending real-life social networks and accelerating mainstream adoption.
For those like me who don’t understand the above diagram, there is a more simplified explanation here: “NeRF or better known as Neural Radiance Fields is a state-of-the-art method that generates novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. The input can be provided as a Blender model or a static set of images. Basically, wave your phone around and voila, you have a 3D volumetric capture (or at least that is the promise).
NVIDIA getting started with NeRFs guide (for advanced programmers)
Note: There are no really easy ways to do this currently, but if you want to go deep, here is a video that explains how to make a NeRF in the easiest way I have found thus far (warning, it’s hard!).
You can also download a program called Polycam 3D for iPhone or Android and start 3D scanning objects and/or scenes for use in platforms such as MetaVRse or Unity.
Computer Vision & SLAM
Computer vision (CV) is the field of computer science that focuses on replicating the complexity of the human visual system and enabling computers to identify and process objects in images and videos in the same way that humans do. Imagine how autonomous cars see and how VR headsets understand what is around you.
Simultaneous Location and Mapping (SLAM) is a form of computer vision that allows your phone to map and understand your surroundings in order to display 3D content in your space. Built into your mobile device are several sensors (Accelerometer, Gyroscope, LiDAR scanner) that, in addition to what the RGB cameras see, provide context in terms of position in the X,Y,Z or 6-Degree of Freedom (6-DOF) space. This allows your phone to understand where the floor is and simultaneously project content into augmented reality.
As CV technology continues to advance, the possibilities will expand from autonomous vehicles, robots, and drones to augmented reality that looks as real as real.
Some of the capabilities of CV and SLAM include object recognition and tracking (think tracking a real-world object while projecting digital information on top of it).
Natural Language Processing & Conversational AI
Natural Language Processing (NLP) is a field of AI that focuses on the interaction between computers and human language. It involves the use of algorithms and statistical models to analyze, understand, and generate human language. NLP is used in a wide range of applications such as language translation, text-to-speech, sentiment analysis, and more.
Conversational AI is a subfield of NLP that focuses on creating human-like interactions between computers and humans using natural language. This can include chatbots, virtual assistants and voice assistants. The goal of conversational AI is to create a seamless and natural communication experience for users. This can be achieved through the use of advanced NLP techniques such as natural language understanding and generation, as well as machine learning and deep learning.
Automatic Content Creation
Nothing says AI like automation. These tools allow you to say what it is you want to create and voila, it is there, in 3D! While there will be a ton of these tools in the Metaverse, this is the first one that we know of that works. This slideshow will give you a much deeper understanding of how this technology will revolutionize gaming. Even music is being created by AI now. Give it a try yourself at Anything World.
Check out this cool 3D object created completely by AI on LumaLabs.
To learn more about new cutting-edge technologies like GET3D from NVIDIA, Make-a-Video from Meta, and DreamFusion from Google, follow Two Minute Papers on YouTube.
As you can see, this is the future and while it is not quite ready for prime time, researchers are using AI to solve for AI so it won’t be long before this becomes how we build every virtual world in the Metaverse.
Generative AI Startup Landscape:
Well, there you have it, a pretty comprehensive look at the artificial intelligence algorithms that will directly impact and hopefully benefit you in the Metaverse.
Alan Smithson is co-founder of MetaVRse. A version of this article first appeared on Smithson’s Medium, contributed here with permission.
Header image credit: Midjourney