Blog

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Product News

Gladia x pyannoteAI: Speaker diarization and the future of voice AI

Speaker recognition is advancing rapidly. Beyond merely capturing what is said, it reveals who is speaking and how they communicate, paving the way for more advanced communication platforms and assistant apps

Speech-To-Text

Building better voice agents: Lessons from Thoughtly × Gladia's webinar

Voice AI has evolved fast — from early experiments that barely handled a “hello” to today’s real-time conversational agents running across industries. Alex Casella (CTO at Thoughtly) sat down with Gladia’s CEO Jean-Louis Quéguiner to unpack the technical and operational realities of building production-grade voice agents.

Speech-To-Text

Safety, hallucinations, and guardrails: How to build voice AI agents you can trust

As voice agents become a core part of customer and employee experience, users need to know these AI systems are accurate, safe, and acting within boundaries. That’s especially true for enterprise-grade tools, where a rogue voice agent can severely damage relationships and create major legal risks.

Case Studies

How Aircall cut transcription time by 95% with Gladia

The contact center is transforming. Traditionally defined by manual workflows, siloed data, and reactive customer service, today's Contact Center as a Service (CCaaS) platforms are embracing a new era—one driven by real-time AI and automation.

Speech-To-Text

How to measure latency in speech-to-text (TTFB, Partials, Finals, RTF): A deep dive

Latency can make or break a voice experience. Whether you’re building an agent that must stop speaking the moment a customer interrupts, or you’re captioning live content, you need a clear, reproducible way to measure how fast your STT really is, from first partial word to final transcript. 

Speech-To-Text

How to build multilingual AI voice agents for the global customer experience

Great customer support experiences rely on clear communication and deep understanding. Until recently, meeting that expectation at scale was nearly impossible—human agents can only handle so many languages, and even fewer can switch between them fluently.