The evolution and impact of Speech AI: An in-depth conversation with Gladia's CEO Jean-Louis

Published on Sep 3, 2024
The evolution and impact of Speech AI: An in-depth conversation with Gladia's CEO Jean-Louis

Once in a while, we like to zoom out of our day-to-day to reflect on the bigger trends affecting our customers to, ultimately, adapt our product accordingly. Today, what are the key shifts happening in voice-first platforms, and how can speech recognition help them to navigate these?

In a recent episode of the Caveminds podcast, Jean-Louis Quéguiner, Co-founder and CEO of Gladia, joined host Benson Sung to discuss the revolutionary world of Speech AI. Throughout the hour-long conversation, they delve into the applications of audio intelligence AI, the intricacies and challenges at the infrastructure-level, and the broader implications of conversational AI for businesses.

Check out the TLDR version below.

The Business Case for Speech AI

One of the driving factors behind Gladia's focus on speech-to-text (STT) and audio intelligence was the sheer volume of valuable data embedded in audio communications like calls and meetings. Jean-Louis points out that a single hour of conversation can generate around 25,000 tokens – a substantial amount of textual information. This data is often underutilized in businesses, despite its potential to provide deep insights into customer interactions and operational efficiencies.

So, what are the specific use cases of this new technology?

Call Centers

Jean-Louis explained that one of the primary applications of speech AI is in call centers which – today – spend over $300 billion annually on operations and have notoriously high turnover rates. 

Traditional transcription services are often slow, costly, and inaccurate, hindering the ability of companies to analyze customer interactions effectively. Speech AI addresses these issues by providing rapid and precise transcription, enabling businesses to identify customer issues in real-time and implement swift corrective actions. This can significantly improve customer satisfaction and operational efficiency.

Beyond standard transcription, Jean-Louis envisions a future where conversational AI can handle initial customer interactions and even automate them end-to-end to reduce costs and improve service quality.

Meeting Recorders

Meeting recorders represent another significant use case for audio intelligence AI. By integrating AI into meeting recording tools, companies can automate the transcription and summarization of meetings. The result? Streamlined follow-ups and action items which allows employees to focus on creative problem-solving rather than mundane documentation tasks. 

The efficiency gained through accurate AI-driven transcription can lead to higher productivity and better decision-making.

Healthcare Applications

As Jean-Louis and Benson discussed, the potential for speech AI in the healthcare sector is also immense. 

Doctors often spend considerable time on documentation, which can be streamlined using AI. Accurate transcription of patient interactions helps maintain comprehensive medical records, improve patient care, and ensure crucial information is not lost during referrals to specialists. This efficiency can also alleviate the administrative burden on healthcare professionals, allowing them to dedicate more time to patient care.

Challenges with Speech AI

While the benefits are clear, several challenges accompany the implementation of STT and audio intelligence AI. Jean-Louis highlighted three key issues: quality and multimodality, catastrophic forgetting, and diarization & speaker operation.

Quality and Multimodality

Jean-Louis emphasizes that for AI to be effective in real-world applications, high-quality transcription is crucial especially when the data is used for further processing by AI systems. 

Multimodality – the ability to integrate audio, text, and visual data – is essential for creating a comprehensive understanding of the information. Achieving this level of quality and integration remains a significant challenge. That’s why Gladia focuses on providing exceptional accuracy across 99+ languages, setting a high standard in the industry.

Catastrophic Forgetting

AI systems can suffer from what’s called “catastrophic forgetting”, where models (particularly LLMs) lose previously learned information when acquiring new knowledge. This issue is particularly problematic in long conversations or documents, where the AI might forget crucial details from the middle of the content. Addressing this problem requires continuous advancements in AI model training and memory retention techniques. 

Gladia tackles this challenge by refining models to ensure consistency and reliability over extended periods. More on this in our dedicated article on LLM-powered summarization available with our API.

Diarization & Speaker Operation

Diarization, the process of distinguishing between different speakers in an audio recording, is another complex challenge. Accurate diarization is essential for ensuring the right information is attributed to the correct speaker, which is particularly important in multi-participant meetings or call center recordings. Overlapping speech, accents, and code-switching make this especially challenging.

Gladia sets itself apart by focusing on advanced speaker diarization techniques and speaker operation functionalities, providing more accurate and reliable results.

The Future of Speech AI

Speech AI is poised to revolutionize the way we interact with and analyze spoken information, paving the way for a more efficient and productive future. Jean-Louis' vision for the future includes seamless integration of AI into everyday business operations, where AI can provide real-time insights and support, ultimately enhancing human creativity and decision-making…not replacing them. 

As Speech AI technology evolves, its applications will continue to expand across various sectors, driving innovation and helping companies transform their operations.

Want to learn more about Gladia’s API? You can try it for yourself for free, or book a demo to learn more. If you’re more interested in the latest news and trends, sign-up for our newsletter.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more

Product News

Ultimate guide to using LLMs with speech recognition is here!

Large Language Models (LLMs) have enabled businesses to build advanced AI-driven features, but navigating the many available models and optimization techniques isn't always easy.

Speech-To-Text

Should you host an in-house speech-to-text solution or outsource to an API provider?

Businesses across industries are adopting speech-to-text (STT) technology to unlock new use cases and meet growing customer expectations. Whether it’s powering virtual assistants, transcribing conversations, or analyzing audio data for insights, STT has become essential for delivering seamless and engaging experiences.

Speech-To-Text

Best speech-to-text APIs in 2025

It’s that time of year again when we compile the top speech-to-text APIs to keep an eye on in 2025. Whether you’re looking to add voice-based AI into your products to automate customer support, enhance note-taking, supercharge your meetings, or more, this list will help you narrow-in on the right provider for your needs.

Read more