Our Road to Real-Time Audio AI – with $16M in Series A funding

Published on Oct 15, 2024

Real-time audio AI is transforming the way we work and build software. With instant insights from every call and meeting at their fingertips, customer support agents and sales reps will be able to reach new levels of efficiency and deliver a more delightful customer experience across borders.

We’re here to give you the sharpest plug-and-play tools to spearhead the real-time revolution.

Today, we're thrilled to announce the release of our new product, Gladia Real-Time. It achieves industry-leading latency of <300 ms across 100+ languages and is designed to deliver top accuracy for meeting recorders, note-taking assistants, and contact center platforms.

In addition to live voice transcription, our latest release offers a spectrum of new real-time add-ons, including sentiment analysis, NER, and summarization. On average, it takes less than a second to generate both a transcript and insights from a call or meeting.

With low-latency transcription, multilingual support, and real-time analytics all in one tech-agnostic API, Gladia Real-Time has been tailored for businesses seeking a top-tier API provider to accelerate their AI roadmap and increase productivity worldwide.

The release coincides with some big news for Gladia—we just raised $16 million in Series A funding—which will enable us to take our product to the next level.

This calls for a quick retrospective, don’t you agree?

Our road to real-time AI

Gladia was founded in 2022 with a mission to enable voice-first platforms to deliver more value to their users across borders with cutting-edge AI.

When we entered the market, voice-first platforms were facing a conundrum: speech-to-text APIs were either fast, accurate, or affordable at scale. Getting all three right was a challenge for the industry and our first Speech-to-Text API aimed to address this.

Some months later, our first proprietary ASR system, Whisper-Zero, took on the challenge of hallucinations and was fine-tuned on accents and noisy audio to perform well even in the most complex environments. The model has since been adopted by thousands of new customers and users, including VEED, Livestorm, Method, Recall and Circleback, reporting industry-leading performance with 99.9% less hallucinations.

Despite having reached these milestones, one of the key challenges with speech recognition remained unresolved—delivering a truly instantaneous, real-time audio transcription experience while on-call.

Solving real-time to help you build future-proof apps

Working closely with our design partners and customers, we’ve realized that despite undisputable advances in transcription accuracy over the last 12 months, the ASR market still faces the speed vs. accuracy trade-off – this time, with real-time streaming.

Asynchronous transcription, also known as ‘batch’, is known for more accurate and polished results, which take some time to generate. Inversely, using real-time (RT) transcription meant having to sacrifice quality for the sake of immediate results. This has forced companies to run both batch and real-time processing to reach optimal results, at a considerably high cost.

Difficulties in implementing RT transcription at scale prevent businesses today—ranging from contact center platforms to AI sales assistance—from embracing a new disruptive way of building software and business models.

Here at Gladia, we know that transcription can be a foundational component of your software. And we strongly believe that you shouldn’t have to compromise on either quality or speed.

Our new real-time engine is designed to help voice-first platforms transition seamlessly from manual post-call processing to proactive, low-latency workflows like automated CRM enrichment or real-time guidance for support agents. All without sacrificing quality or speed.

The Gladia Way: Real-Time AI, without trade-offs

Building an accurate, low-latency, and multilingual engine in-house is a complex and resource-intensive task. Real-time models require more computing power and may struggle to produce accurate output immediately due to limited context. To overcome these challenges, you need extensive in-house expertise in language understanding, real-time data handling, and continuous optimization and maintenance.

Gladia allows platforms to bypass these challenges. The new real-time engine boasts industry-leading latency of under 300 milliseconds without compromising accuracy, which remains on par with—and even exceeds—the near-perfect batch results your users expect today. This way you can enjoy state-of-the-art real-time AI without the hassle of having to build and maintain the models in-house.

This release goes beyond transcription, tapping into the more granular insights and metadata that accurately transcribed calls and meetings have to offer. With Gladia Real-Time, you can extract insights from a call—like the caller's sentiment, key information, and conversation summary—with minimal additional latency.

Breaking the language barriers, whatever the use case

Since day one, Gladia has been committed to delivering cutting-edge ASR and Generative AI models to customers globally.

Given that most speech recognition models today are trained predominantly on English audio data, they’re inherently biased and user experience across languages can vary significantly.

Being a highly international team ourselves, we prioritized building the first truly multilingual real-time product to accelerate your international expansion. Our new real-time transcription and analytics engine delivers advanced real-time transcription and insights in over 100 languages, along with enhanced support for accents and dialects, including noisy telephony audio.

What’s more, Gladia Real-Time is unique in its ability to accurately transcribe conversations where speakers switch between languages and accents in real time. Don’t forget to activate the ‘code-switching’ feature to see the magic in action!

In addition to its advanced multilingual capabilities, the new API is compatible with all existing tech stacks and telephony protocols, including SIP, VoIP, FreeSwitch, and Asterisk. This makes it a universal turnkey tool for any voice-first platform and app, optimized to help you deliver the best possible experience to your customers.

Coming up next: AI Audio Infrastructure

Gladia Real-Time marks the beginning of a new chapter for our company, with freshly secured funding to supercharge our growth.

We’re thrilled to officially complete the $16 million Series A funding round led by XAnge, with participation by Illuminate Financial, XTX Ventures, Athletico Ventures, Gaingels, Mana Ventures, Motier Ventures, Roosh Ventures, and Soma Capital.

The new capital will allow us to fulfill this ambition, advancing our R&D efforts to bring to market a more robust and versatile product portfolio with new à la carte models and vertical solutions.

Our ultimate goal is to create an end-to-end audio AI infrastructure. The ultimate destination for anyone looking to convert unstructured speech data into actionable insights for call agents, sales representatives, and content creators worldwide. Our commitment now extends to helping you leverage in-depth insights and metadata from every call and meeting, instantly. To make every conversation count.

Thank you for being with us on this journey. Stay tuned for more products and features coming shortly.

Try Gladia Real-Time now or book a meeting with our team for a personalized demo of the new engine.

And in case you missed it, we recently hosted a webinar led by our CEO to showcase Gladia Real-Time’s multilingual capabilities and add-ons live. You can watch the recording here.

Contact us

Your request has been registered

A problem occurred while submitting the form.

Getting started with Gladia: How to build with our STT API features

Whether you’re using Gladia’s speech-to-text (STT) API during a free trial or a long-term integration, you care about one thing: getting accurate, reliable transcriptions that work for your product and users.

Case Studies

How real-time transcription creates a competitive advantage in fintech

Fintech is evolving fast. Gone are the days of clunky logins and endless passwords. Today, users expect seamless account access, minimal friction and one-click payments.

Speech-To-Text

Real-time agent assist: Unlocking better call center services with speech-to-text

Customer service is evolving fast to meet new challenges. Today's clients expect immediate, accurate answers to increasingly specific queries and complaints. Meanwhile, contact centers need to reduce costs, improve efficiency, and maintain compliance…all while delivering exceptional experiences.