AR comes in many flavors. This is true of any technology in early stages as it twists around and takes shape. The most prevalent format so far is social lenses, as they enhance and adorn sharable media. Line-of-sight guidance in industrial settings is also proving valuable.
But a less-discussed AR modality is visual search. Led by Google Lens and Snap Scan, it lets users point their smartphone cameras (or future glasses) at real-world objects to identify them. It contextualizes them with informational overlays… or “captions for the real world.”
This flips the script for AR in that it identifies unknown items rather than displaying known ones. This makes potential use cases greater – transcending pre-ordained experiences that have relatively narrow utility. Visual search has the extent of the physical world as its canvas.
This is the topic of a recent report from our research arm, ARtillery Intelligence. Entitled Visual Search: AR’s Killer App?, it dives deep into the what, why, and who of visual search. And it’s the latest in our weekly excerpt series, with highlights below on visual search’s drivers & dynamics.
Space Race
One of Google’s driving factors in visual search is geo-local in nature, as it follows decades of work in local search. For example, local search and Google Maps represent a sizable portion of Google’s revenue. This is because local modifiers like “near me” are user intent signals.
Furthermore, visual search that’s based on physical places feed into Google’s “Internet of Places” ambitions. Also what we call the “metavearth,” this term defines Google’s mission to index the physical world, just as it created immense value indexing the web for the past 20+ years.
This plays out in a few ways including Google Lens and Live View. The former is Google’s main visual search product to identify and contextualize objects that you point your phone at. This can include fashion and food but the “local” version of this is to identify storefronts.
Google Lens’ cousin, Live View, similarly lets users navigate urban areas using 3D directional arrows. This can be more intuitive than looking down at a 2D map. To do this, Google uses Street View images to help devices recognize where they are before overlaying directional graphics.
Moreover, Google recently brought Lens and Live View together into one UX. While navigating an urban area with Live View, Google offers small click targets on your touchscreen when it recognizes a business storefront. When tapped, expanded business information appears.
Motivated and Vested
Google can execute all the above using the business listings data that it’s been collecting for years – everything from hours of operations to menu items to Covid mask policies. Taking this to the next level, it wants to offer visual search capabilities once you step inside the business.
To that end, Google Lens uses computer vision and machine learning to process text. For example, it can scan restaurant menus to search for the ingredients in a given dish. From there, it can do things like inform users about nutrient info or user reviews ordering tips.
Looking forward, Google Lens continues to improve in both capability and positioning. For example, as its machine-learning brains develop, it’s being dispatched to more places. These include well-traveled real estate like Google’s mobile app search bar and Chrome address bar.
As for capabilities, Google continues to add new features to drive high-intent visual shopping behavior. For example, its “multi-search” feature lets users combine search inputs. They can start with a visual search, then narrow results with text (e.g., “the same jacket in blue”).
Meanwhile, Google’s Geo-local API takes the capabilities of Live View and spins it out for third-party developers to build visual search experiences. If all the above momentum continues, we’ll see more investment from Google – motivated and vested to make visual search a thing.