As we stated in Part I of this series, the metaverse needs a reckoning. Like the term itself, the space feels shrouded in ambiguity, garnering far more questions than answers.
To that end, this essay proposes a new framework for defending the metaverse. It’s designed to arm you with better talk tracks for educating and inspiring your friends, your colleagues, and your customers.
The framework consists of three pillars:
- The ‘what’ (a more simple and refined definition)
- The ‘why’ (why this future matters)
- The ‘how’ (how we’re going to get there, and the progress thus far)
After covering the first two points in Part I of the series, we’re now back now in Part II to tackle the all-important how.
The How
Building the metaverse’s end-state that was envisioned in part I requires a clearer roadmap.
As much as it pains me, this starts with moving AR/VR further out on the timeline.
While that end-stage should be what drives us, the game-changing experience we all want/need is further out than originally expected. This article from Matthew Ball explains why.
The TLDR: mainstream XR (extended reality) is at least another 8-10 years away (mayyyyyybe 6-8 with some lucky breakthroughs). Let’s accept that and be patient.
As a silver lining: up until now, we were just feeling in the dark, unsure how to solve AR/VR’s thorniest technical challenges. But now, we have the blueprint and we’re well on our way.
Meanwhile, we can make progress at other critical parts of the spatial computing stack, with tech that works and is rapidly maturing. Particularly with 3D content and 3D scanning of the physical world.
3D content should be the near-term focus: Even if we had the ultimate pair of AR/VR glasses today, they wouldn’t be terribly useful. Outside of games, most brands/companies suck at 3D content creation.
But that’s starting to change.
Brands and companies are starting to think/operate ‘3D first’, arming themselves with the right talent, 3D services, and 3D workflows to crank out immersive experiences, repeatedly and affordably.
When done right, 3D content won’t come in a slow trickle. It will be explosive, like striking a rich oilfield of 3D data.
Most of the brands you know and love; the Nike’s, the BMWs, the Gucci’s, the Ikea’s; they’re already sitting on a wealth of existing 3D assets, most commonly as CAD models of products (digital shoes, watches, cars, buildings, planes, etc).
For the most part, and compared to what’s possible… This 3D data is lying dormant, woefully underutilized beyond the design phase, stuck in fragmented data silos and disparate 3D creation tools.
To tap this 3D oilfield, companies are starting to look/feel like a AAA game or VFX studio. They’re pulling talent from games/VFX and establishing 3D content pipelines and 3D asset management systems designed to capture 3D assets at their source (the product design/R&D teams), and then funnel them downstream across the organization.
In the wake of this activity, new 3D tools and services are coming to light to accelerate the transformation.
At AWS, we recently open-sourced a 3D pipeline and asset management solution called VAMS (Visual Asset Management System). NVIDIA is gaining momentum with Omniverse, revolutionizing collaborative 3D content creation and interoperability between disparate 3D data/tools. Epic Games acquired photogrammetry leader Capturing Reality and SketchFab for 3D asset management/sharing. They’ve since launched Fab: a 3D asset marketplace to help connect 3D content creators to consumers/users. Even old-school Shutterstock is getting into the game, recently acquiring a leader in 3D asset marketplace and asset management product called TurboSquid.
And now, we’re being graced by the ultimate 3D accelerator… artificial intelligence (AI).
A new technology called NERF recently emerged, leveraging AI to better turn 2D photos and videos into 3D. This tech could do for 3D what digital cameras and JPEG did for 2D imagery; dramatically increasing the speed, ease, and reach of 3D capture and sharing.
If you want to geek out on NERF, here’s a good deep dive.
NERF tech is now being merged with computer vision, allowing machines to understand and reason about both the physical and virtual environment, and all the things/items within. Check out this video.
And don’t get me started on the potential of generative AI for 3D…
Today, AI tools like Dall-E and Midjourney can create 2D images of vast, static worlds. Soon, and I mean very soon, these tools will be able to produce 3D objects from scratch, with nothing but a basic prompt (be it text or imagery). The creation of full-on virtual worlds won’t be far behind.
We’re already seeing LLMs being used today as a co-pilot for 3D modeling, e.g. this combo of Stable Diffusion and Blender or this text-to-3D tool from Spline. Absolute game changers.
In other words, the great democratization of 3D creation is well underway.
If we double down on 3D data/content, the incentive for AR/VR adoption will be much higher when it finally has its moment. 3D content is the initial chicken/egg (pending your worldview).
In parallel to 3D creation, we’re also making strides towards the much-needed standards required for the metaverse’s holy grail capability: 3D interoperability. Brands and developers are rallying around standard 3D formats such as gLTF and USD, and the Metaverse Standards Forum is cementing best practices in their wake.
This ‘3D data layer’ is not as sexy as headsets, so it avoids mainstream coverage. But it’s arguably the most important layer to prioritize and get right. Everything else will get pulled through as a result.
In the meantime, real-time 3D on a screen is proving a good enough window into the metaverse for many use cases. Let’s focus here for now and let ‘path dependency’ work its magic. Neal Stephenson, father of the term metaverse, explains this best:
“Thanks to games, billions of people are now comfortable navigating 3D environments on flat 2D screens. The UIs that they’ve mastered [keyboard and mouse for navigation and camera] are not what most science fiction writers would have predicted. But that’s how path dependency in tech works. We fluently navigate and interact with extremely rich 3D environments using keyboards that were designed for mechanical typewriters. It’s steampunk made real. A Metaverse that left behind those users and the devs who build those experiences would be getting off on the wrong foot… My expectation is that a lot of Metaverse content will be built for screens (where the market is) while keeping options open for the future growth of affordable headsets”
The AR Cloud: With 3D content well on its way, the next item on the roadmap is the AR Cloud, particularly, it’s key building blocks: LiDAR scanning and VPS (visual positioning systems); all accessible via the sensors on your smartphone today.
Think of the AR Cloud as a ‘digital twin’ of the world; consisting of machine-readable 1:1 scale models (or 3D maps via 3D scans), of places (parks, neighborhoods, cities) and things (buildings, stadiums, factories, classrooms). These 3D ‘maps/scans’ will act as anchor points for interactive, digital layers on top of physical environments, accessible by AR devices based upon context (where you are, what you are doing, who you are with).
When paired with VPS (visual positioning systems) and AI, this effectively becomes a ‘search engine’ for the physical world; connecting atoms to bits and allowing users to access digital information and services/apps associated with any place, person, or thing. As for developers, this will look/feel like an ‘API to the World’, making anything & everything ‘discoverable’, ‘shoppable’, and ‘knowledgeable’.
The AR Cloud concept flies under the radar. But it’s becoming the next Big Tech battleground, with implications for the future of navigation, shopping, entertainment, news, robotics, health/safety, travel, and education, just to name a few. Many believe it will be the most important developer platform since the internet itself… especially when paired with AI and LLM (large language models).
While it sounds sci-fi, this tech is here and now.
Apple offers Spatial Anchors within Apple Maps. Google has similar AR features within Google Maps. Microsoft provides Azure Spatial Anchors, and while historically limited to the Holo-Lens AR headset, it’s becoming more device agnostic. Snap and Niantic have also made major leaps towards this end-game, with Snap Lens Cloud and Niantic’s Lightship SDK: marketed as the developer platform for the real-world metaverse. Lesser-known companies are also on the rise, like Matterport and Prevu3D, both making it very easy to scan the built world with commodity hardware.
Conclusion: A Narrative Shift
I hope this framework leaves you feeling empowered to better defend/justify the metaverse.
Because the narrative shift we need isn’t going to come from Big Tech and mainstream media. It’s going to come from people like you, evangelizing in a more bottom-up, grassroots fashion.
So please, go forth and reshape minds. Help us rid the industry of imprecise definitions, misguided hype, and sensational headlines; remind people we’re already living in a proto-metaverse, with much to be desired; explain how AR can create more connection and understanding; show how ChatGPT makes computing accessible to all; point to the meaningful progress being made upstream of headsets; and most importantly, encourage people to take a more optimistic stance.
Because here’s the reality: the technology genie is out of the bottle, and there’s no turning back. In light of this fact, we have two choices.
We can over-index on tech’s past failures, operate out of fear, and complain about the world we’re currently in.
Or… we can learn from tech’s mistakes, anticipate negative externalities, and work to shape the world we want to see.
The choice is yours.
Evan Helda is Principal Specialist for Spatial Computing at AWS, where he does business development, product strategy, and go-to-market strategy for all things AR, VR, and real-time 3D. You can check out more of his insights at MediumEnergy.io, or follow him on Twitter @evanhelda. Opinions expressed here are his, and not necessarily those of his employer.
Header image credit: Mathew Schwartz on Unsplash