Blog

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Speech-To-Text

Top 5 Whisper GitHub projects: A practical guide for programmers

In September 2022, OpenAI unveiled Whisper, an innovative open-source automatic speech recognition (ASR) model trained on an impressive dataset of 680,000 hours of diverse speech. Since its release, the model has received widespread recognition for its remarkable robustness and accuracy. It rivaled human capabilities in English speech recognition and set a new standard for multilingual transcription and translation.

Tutorials

How to set up a Node.js transcription WebSocket with the Gladia live audio transcription API: A step-by-step guide

Have you ever used an audio-to-text transcription application to convert audio-to-text and wondered how it worked or how to build one? Are you a developer looking to add audio transcription to your next project? This article answers these questions and more.

Tutorials

Integrating Gladia audio transcription API with Make for workflow automation

Embark on a journey to optimize your workflow by seamlessly integrating Gladia through Eden AI with Make. This comprehensive guide will take you through the step-by-step process, empowering you to harness the full potential of automation in your tasks.

Speech-To-Text

A review of the best ASR engines and the models powering them in 2024

Automatic Speech Recognition (ASR), also known as speech-to-text or audio transcription, is a technology that converts spoken language stored in an audio or video file into written text.

Case Studies

Mastering AI transcription for social media captions: Mojo's success story with Gladia

From Reels to ads to YouTube shorts, video content consumed in vertical bite-size format on social media is becoming among the primary ways we interact with the world for both leisure and business.

Tutorials

Enhancing real-time transcription with WebSockets and Golang

Communication has evolved from sending post letters and waiting at phone booths to digital connections, happening simultaneously at a high speed, especially in business environments. Given the unprecedented volumes of voice data generated by companies daily, the ability to document customer calls, conferences, and online meetings in real-time and asynchronously is becoming crucial.