Can Apple Mainstream Visual Search?

Visual search is a technology we’ve always considered to be a sleeping giant. It contextualizes items you point your phone at, using a combination of computer vision and machine learning. This makes it a sort of close cousin of AR by annotating the world with informational overlays.

Rather than the fun and games that have erstwhile defined consumer AR, visual search is more of a utility. And utilities are historically what turn into killer apps (think: Uber). With visual search, utility is joined by broad appeal and high frequency… just like web search (another killer app).

Speaking of web search, Google is keen on visual search for all these reasons. It’s a natural extension of its core search business, and it can meet core objectives like growing search query volume. With several search inputs – text, voice, visual, etc. – Google can maximize engagement.

But despite all these driving forces, Google Lens has been slow to get off the ground. Users are set in their ways so introducing another search UI takes a while. Beyond habitual factors, visual search involves “activation energy,” such as tapping into an app and holding up your phone.

Is Multimodal AI AR’s Unsung Hero?

Play to its Strengths

Unexpectedly, Apple recently stripped away some of this visual search friction. The iPhone 16’s Camera Control button offers a function to launch visual searches for items you encounter in the real world. Known as Visual Intelligence, it’s one piece of the broader Apple Intelligence play.

So when coming across a new restaurant in your neighborhood, you can use the physical button to visually query the storefront to find an online menu. That contrasts the multiple screen taps needed to activate Google Lens (though in fairness, Google has made Lens pretty accessible).

The question is if all of this can accelerate visual search’s cultural assimilation. This could happen as an inherent byproduct of Apple simply launching it. The combination of these factors – Apple’s Halo Effect and eliminating UX friction – could be the nudge that visual search needs.

Moreover, Apple’s Visual Intelligence doesn’t cut Google out of the loop – it hands off visual searches to Google in certain situations like shoppable items. This lets each company play to its strengths – Apple’s physical touchpoint to consumers and Google’s knowledge graph.

The New Face of AR: Ambient & Intelligent

Ambient & Automated

What Apple has done with Visual Intelligence and the Camera Control button is just one step towards eliminating friction from visual search. The technology will truly shine when it’s in your line of sight, rather than in your pocket. In other words, when smart glasses penetrate further.

This is already well underway considering multimodal AI in Ray Ban Meta Smartglasses. It not only positions visual search as a defining function of the suprise-hit device, but adds dimension. In other words, it’s not just about visual search, but refining those searches in real-time with voice.

This brings the intuitive potential of visual search to another level. Not only does it make it ambient and automated in your line-of-sight – rather than an upheld phone – but it’s more conversational (e.g., “What kind of jacket is that, and where can I buy it, and does it come in blue”).

Meta isn’t the only one working on this. Snap’s new Spectacles do something similar, using Chat GPT and Google Gemini, which worked well when we tried it. Altogether, the wheels are in motion for visual search evolution. But it will continue to be a long road to mainstream embrace.

https://youtu.be/sJRC7ui1yzM?si=AbsQMu9-fp_JRiIe

ARtillery Briefs, Episode 76: AI & Spatial Computing Converge (https://youtu.be/sJRC7ui1yzM?si=AbsQMu9-fp_JRiIe)