API Comparison Table

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

Pricing

Request a demo

Sign up

Get started

Speech-to-text for AI medical scribes: Why clinical vocabulary breaks generic STT

TL;DR: Generic STT engines fail in clinical environments because language model probability overrides correct acoustic detection of medical terms, substituting phonetically plausible but clinically wrong candidates silently. The result corrupts drug names, dosages, and diagnoses before the LLM ever sees them. Before selecting an STT engine for a medical scribe, verify four things: whether vocabulary biasing works at inference time without fine-tuning, whether async diarization accurately separates clinician and patient audio, whether the model holds up on noisy consultation recordings rather than clean read-speech, and whether the vendor's data training policy covers PHI by default on your plan.

Speech-To-Text

Migrating from self-hosted Whisper to a managed speech-to-text API

TL;DR: Self-hosting Whisper's true cost rarely sits in the model weights. GPU idle time, VRAM leaks under parallel load, and the engineering hours spent maintaining CUDA dependencies and diarization pipelines are where the bill compounds. For teams processing under roughly 3,000 hours per month, assuming 20% of one US FTE at $150K loaded annual cost, a managed API is cheaper, though the break-even shifts materially against your actual labor cost. Above that threshold, the decision depends on your DevOps overhead and whether audio accuracy on real-world recordings matters for downstream systems like CRM sync and coaching scores.

Speech-To-Text

Migrating from AssemblyAI to Gladia: A step-by-step switching guide

TL;DR: Switching from AssemblyAI requires four concrete changes: update one auth header, remap batch endpoints, adjust the JSON response schema, and resample audio for WebSocket connections. Multiple customers independently report completing these in under a day with a rollback abstraction layer in place. The bigger structural difference is cost model: a production stack with diarization, sentiment, entities, and summarization runs $0.30/hr on AssemblyAI's Universal-2 tier because each feature is metered separately, versus a bundled base rate. This guide covers the exact parameter mappings, payload diffs, WebSocket reconfiguration, and a zero-downtime cutover strategy.

How to build a voice-to-text Discord bot with Gladia real-time transcription API

Published on Sep 21, 2023

Discord, the leading communication platform for gamers and communities, is designed for seamless communication with other users, be it through text channels, DMs, 1-1 calls or even collective voice channels.

Based on multiple request from our Discord members, we’ve built a custom JavaScript bot that makes use of Gladia’s live transcription API to transcribe speech in real time directly on the Discord server.

What can you do with Discord bot?

First, you can transcribe voice in real time directly on Discord’s voice channels. Ex. you’re streaming a game on Discord and want to access some learnings and tips received during the sessions. Or, you’re having your group gathers on the platform and want to be able to review the talking points after – just like with any other virtual meeting platform.

Beyond that, a bot like this could be used for real-time moderation to flag hate speech and ban users. With additional tools like ChatGPT, you could also create command-based notes to provide meeting summaries and helps you catch up with meetings you may have missed.

Screenshot of a transcription bot on Discord

How to implement the Discord.js v14 bot + Gladia real-time transcription

Step 1: Register your bot

Create a Discord bot that you'd like to use for transcription. If you’ve never built one before, here’s a useful resource to help.

First, install all the required package by running:


npm install

Then, you will to setup the index.js script with your Discord keys, guild ID (Server ID), and the Voice Channel ID.

Step 2: Retrieve API key

Sign up for our speech-to-text API at app.gladia.io and obtain your API key. Documentation for Gladia live transcription can be found here.

Step 3: Code integration

Once everything is set up properly, simply run:


npm run start YOUR_GLADIA_TOKEN

Your bot should then join the channel corresponding to the channel ID you configured in the index.js file.

Step 4: Configure Discord permissions

Make sure your bot is invited on the server;
Give the bot the required voice permissions.

Bear in mind that the current v1 implementation of the bot is not fully optimized, so you might experiences inaccuracy regarding language changes & words.

And you’re good to go!

🔗 Source GitHub repository is available here.

We hope you enjoyed this short tutorial. Given how much audio data still goes to wasted, we’re always curious to explore the many ways in which transcription tech can be used to remedy that. Let us know if you went on to build a bot or used our API for others apps on Discord or beyond, we’d love to hear from you.

About Gladia

At Gladia, we built an optimized version of Whisper in the form of an API, adapted to real-life use cases and distinguished by exceptional accuracy, speed, extended multilingual capabilities and state-of-the-art features, including speaker diarization and word-level timestamps.

Contact us

Your request has been registered

A problem occurred while submitting the form.

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

GDPR Compliant

HIPAA Compliant

AICPA SOC Type 2

ISO 27001 Compliant

Gladia

Newsletter

Become the Speech AI expert in your organization with content from Gladia right in your inbox, no more than twice a month.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By continuing your navigation, you apply the use of cookies intended to improve the performance and the functionalities of this site.

No, thanks

Accept

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

New model: Solaria-3

Test our real-time and async transcription

2026 Meeting Assistant Report

Read more

Speech-to-text for AI medical scribes: Why clinical vocabulary breaks generic STT

Migrating from self-hosted Whisper to a managed speech-to-text API

Migrating from AssemblyAI to Gladia: A step-by-step switching guide

How to build a voice-to-text Discord bot with Gladia real-time transcription API

What can you do with Discord bot?

How to implement the Discord.js v14 bot + Gladia real-time transcription

Step 1: Register your bot

Step 2: Retrieve API key

Step 3: Code integration

Step 4: Configure Discord permissions

About Gladia

Contact us

Read more

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Gladia

Newsletter

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.