As Google continues to upgrade its vast product line with AI infusions, the latest on the list is Google Lens. The visual search tool now boasts more intelligence to not only identify items in a photo or live camera view, but let users go deeper in asking questions about said items.
As background, Google Lens is the search giant’s visual search play. Rather than typing (or speaking) text, it lets you simply point your camera at objects to identify or contextualize them. You can also do this with photos on your device’s camera roll or those encountered on the web.
This isn’t a silver bullet, nor will it fully replace deep-rooted habits around traditional search, but it does offer a more intuitive UX in some contexts. For example, use cases include items in nature (think: plant species), shopping (think: fashion items), and local discovery (think: storefronts).
Back to the most recent updates, users can go deeper into visual searches. This was possible before using multisearch, which let users refine visual results with text. The difference with the latest update is that generative AI joins the party to return text-based insights as well.
Search What You See
To distill all the above into an example, users can point their phones at fashion items that they encounter to identify them with Google Lens. That was already the case, as was multisearch, which let users filter those visual results with text (e.g., “the same jacket in green”), as noted.
What’s new in the latest update is that, in addition to refined/filtered visual results, users will also see text that further contextualizes a given item. Using the above example, the text might tell you that the jacket is part of H&M’s fall line, and here are the nearest places you can buy it.
Similarly, someone wandering through a new neighborhood can use Google Lens to identify storefronts. They can then ask qualifying questions about any business. Is the restaurant pet-friendly? Does it accommodate large groups for a birthday party? What’s its Yelp rating?
All this data flows from Google Business Profiles. Surfacing it in the Google Lens clickstream is conceptually similar to what Google has been doing for years with the knowledge panel in search results pages (SERPs). The difference with Lens is avoiding SERPs altogether.
Meanwhile, joining the above announcement about smarter visual searches, Google announced that users can search by circling an element in a given photo. The item they circle will then be the input of a visual search, thus atomizing its capability from photos to items within those photos.
AI on top of AI
Before this update, Google Lens was always AI-oriented. On the back end, its core functionality – recognizing and contextualizing physical-world objects – is accomplished through Google’s knowledge graph. In this case, 20 years of Google Images represents one giant training set.
So the latest move essentially adds AI on top of AI. More accurately, it upgrades Google Lens’ existing AI with some of the newer flavors of the technology that have inflected over the past year. These include some of the large visual models and language models that fuel generative AI.
Buried in all of this is the ongoing lesson of the recent AI hype cycle: the technology isn’t new. Because it has inflected in interest, investment, and capability, the message sent in generalist tech media is that AI is some new scary thing. In reality, most AI is as mundane as spell check.
That said, recent AI inflections are real. The term ‘hype cycle’ above is perhaps unfair because AI isn’t as hollow as recent hype cycles (we’re looking at you, metaverse). The tech is here today, and broadly applicable. So expect more infusions and upgrades for Google and everyone else.
That includes spatial computing. Outside of Google Lens, AI is breathing new life into XR – though it’s often wrongly characterized as replacing it. That’s everything from automating 3D creation workflows to streamlining ways to augment your reality. This will be a moving target.