Lipdub is a new free app from New York City-based Captions, a two-year-old video startup serving social media content creators with text captioning bundled with other services, like editing and special effects. Founder Gaurav Misra is the former head of design engineering at Snap and a former software developer at Microsoft.
“Captions entered the translation space in the fall of 2022 with the release of translated captioning. We then launched our voice translation feature, AI Dubbing, in early 2023, and were quickly met with enthusiastic adoption from users around the world,” said Misra in an interview. Captions says it has more than 100,000 daily users and upwards of five million creators have tried its products, including its iOS video creation and editing app, and its website where creators can upload and compress video, use its AI Eye Contact feature to automatically correct videos where the speaker wasn’t looking at the camera, and automatically add AI-generated subtitles and captions to videos. Misra says in spite of the incredible costs of training new AI models, Captions is already profitable. In June, the company raised a $25 M Series B round led by Kleiner Perkins with participation from Sequoia Capital, Andreessen Horowitz (a16z), and SV Angel. To date, Captions has raised $40 M in funding.
“AI Dubbing’s success inspired us to push this technology to the next level and introduce synced lip movement to the mix. Since then we’ve been focused on making the technology widely available, leading us to create and launch Lipdub as its own, separate app.” Misra said. Among those using the app are Disney-owned sports network ESPN and its commentator Omar Raja, “Mr. Wonderful” of Shark Tank fame, Twitch’s founder Justin Kan, and the influencer Unnecessary Inventions.
ElevenLabs, also based in New York City, has recently released its own AI Dubbing feature with support for 20 languages. It’s a similarly well-funded startup, having raised a $20 million series A over the summer. It shares a notable investor with Captions, Andreessen Horowitz. Victoria Weller, Chief of Staff to ElevenLabs’ CEO Mati Staniszewski, has nothing but praise for Captions. “We’re close with their team and know about their lip sync release. To some extent it is similar, but in many ways, it’s also very different from what ElevenLabs does. The translation of videos is one aspect of the broader mission of ElevenLabs, where we’re tackling to make any audio content accessible in any language, and that being possible in any voice.”
Speaking Korean with my own voice – and lip sync – via a new free app, LipDub, from @getcaptionsapp pic.twitter.com/FGqXhghhdK
— Charlie Fink (@CharlieFink) November 7, 2023
Meta Platforms has introduced SeamlessM4T, a versatile open-source platform capable of comprehending close to a hundred languages, whether spoken or written, and can convert these into translations simultaneously. Meanwhile, startups like MURF.AI, Play.ht, and WellSaid Labs are venturing into the realm of voice cloning by producing entirely artificial AI-generated voices that can be synchronized with video content. However, the focus of these companies is predominantly on the audio aspect, and unlike Captions, they do not provide as extensive a range of video editing capabilities.
Wondercraft is a UK company led by Dimitris Nikolaou, who told me in a conversation yesterday that his company uses ElevenLabs as part of its tech stack, which dubs podcasts and longer form video into foreign languages using voice cloning. It’s my voice speaking Spanish above, but note the lips don’t sync. “We’re an audio-first company, but our perspective is that the video sorry, the lip dub, the lip syncing is a little bit gimmicky right now. What is more important is to get the translation right. At the end of all of the dubbing jobs that we provide, there’s always a human translator that edits the final version before it gets published.”
On one hand. I really love this tech. I’m already addicted. On the other hand, if this is what the good guys can do, what might bad actors might do, not with the apps these companies offer, but with their own, similar AI apps, purpose-built for the spread of disinformation? I know, this takes some of the fun
out of it for me, too. While we marvel at all these amazing new applications, I am fearful of their price.
Charlie Fink is the author of the AR-enabled books “Metaverse,” (2017) and “Convergence” (2019). In the early 90s, Fink was EVP & COO of VR pioneer Virtual World Entertainment. He teaches at Chapman University in Orange, CA.