Deepgram Release Notes
84 release notes curated from 131 sources by the Releasebot Team. Last updated: May 22, 2026
- May 21, 2026
- Date parsed from source:May 21, 2026
- First seen by Releasebot:May 22, 2026
May 21, 2026
Deepgram adds profanity filtering for all multilingual models and improves Korean transcript spacing, making multilingual moderation and Korean speech results cleaner and easier to read.
Profanity Filtering Now Supported for All Multilingual Models; Korean Spacing Improvements
🆕 Profanity Filtering for Multilingual Models
Deepgram’s Profanity Filtering feature is now available for all multilingual models: Nova-2 multilingual, Nova-3 multilingual, and Flux multilingual (language=multi). You can enable profanity filtering in your API requests by setting the profanity_filter=true parameter. When enabled, inappropriate language is automatically replaced with asterisks (****) in the transcript.
This extends profanity filtering beyond single-language models, making it easier to process and moderate content in multilingual scenarios.
Learn more about using Profanity Filtering and see the full list of supported languages on the Profanity Filtering documentation page.
🛠️ Fix: Improved Word Spacing in Korean Transcripts
We fixed an issue affecting Korean transcripts (ko, ko-KR) where word spacing was sometimes missing. Transcripts should now better reflect proper Korean spacing, improving readability for users working with Korean audio.
See the full list of supported languages on the Models & Languages Overview page.
Original source - May 19, 2026
- Date parsed from source:May 19, 2026
- First seen by Releasebot:May 22, 2026
May 19, 2026
Deepgram adds Gemini 3.1 Flash Lite to the Voice Agent API as a managed Google LLM, replacing the preview version with a Standard tier model and deprecating gemini-3.1-flash-lite-preview.
Gemini 3.1 Flash Lite Now Available
gemini-3.1-flash-lite is now available as a managed Google LLM in the Voice Agent API. This Standard tier model replaces the preview version.
Set the model in your agent configuration:
{ "agent": { "think": { "provider": { "type": "google", "model": "gemini-3.1-flash-lite", "temperature": 0.5 } } } }Deprecations
gemini-3.1-flash-lite-preview is deprecated and will be removed on May 26, 2025. Migrate to gemini-3.1-flash-lite.
For more details on Gemini model deprecations, see Google’s Gemini deprecations page.
For the full list of supported models and pricing tiers, see the Voice Agent LLM Models documentation.
Original source All of your release notes in one feed
Join Releasebot and get updates from Deepgram and hundreds of other software products.
- May 15, 2026
- Date parsed from source:May 15, 2026
- First seen by Releasebot:May 16, 2026
May 15, 2026
Deepgram now supports Numerals for Russian, Romanian, and Hebrew monolingual models, helping turn spoken numbers into digits for cleaner, more accurate transcripts.
Numerals Support Now Available for 3 New Languages: Russian, Romanian, and Hebrew (Monolingual Models)
Supported languages and language codes:
- Russian (ru)
- Romanian (ro)
- Hebrew (he)
You can now use Deepgram’s Numerals feature with monolingual models for Russian, Romanian, and Hebrew. Numerals converts spoken numbers into digits (for example, “three hundred” → “300”) in your transcript, helping you create more accurate and easily processed results.
How to use Numerals:
To enable numerals, add the numerals=true parameter to your Deepgram API request.
Learn more about using Numerals and see the full list of supported languages on the Numerals documentation page.
Original source - May 14, 2026
- Date parsed from source:May 14, 2026
- First seen by Releasebot:May 16, 2026
May 14, 2026
Deepgram ships its May 2026 self-hosted release with batch diarization v2 for new deployments, bringing stronger speaker labeling and a new diarize_model option for batch requests while keeping existing integrations and response formats unchanged.
Deepgram Self-Hosted May 2026 Release (260514)
Container Images (release 260514)
quay.io/deepgram/self-hosted-api:release-260514
Equivalent image to:
quay.io/deepgram/self-hosted-api:1.187.0
quay.io/deepgram/self-hosted-engine:release-260514
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.117.0
Minimum required NVIDIA driver version: >=570.172.08
quay.io/deepgram/self-hosted-license-proxy:release-260514
Equivalent image to:
quay.io/deepgram/self-hosted-license-proxy:1.10.1
quay.io/deepgram/self-hosted-billing:release-260514
Equivalent image to:
quay.io/deepgram/self-hosted-billing:1.13.0
Batch Diarization v2 model delivery for new self-hosted deployments
Release 260514 ships Deepgram’s new batch diarization model (v2) to self-hosted. New deployments provisioned through your Deepgram representative will receive only the v2 batch diarizer model on disk by default. To produce diarized output on a fresh deployment, batch requests must specify diarize_model=v2 or diarize_model=latest. diarize=true on its own is pinned to v1; on a 260514 deployment that does not have the v1 model on disk, /v1/listen?diarize=true returns a successful response with no speaker labels — consistent with Deepgram’s longstanding behavior when a requested diarizer model is not present.
Existing deployments retain their v1 batch diarizer and continue to work without changes. To add v2 to an existing deployment, contact your Deepgram representative.
This Release Contains The Following Changes
- Batch Diarization v2 — A new batch diarization model with significantly improved speaker labeling, preferred 3.3× over v1 in side-by-side human evaluation. Strongest gains on contact-center audio (~80% reduction in median Confusion Error Rate vs. v1, ~60% at p95). Compatible with Nova-1, Nova-2, Nova-3, plus enhanced and base batch models; monolingual and multilingual. Not compatible with Whisper. The API response format is unchanged from v1. Batch-only; streaming diarization is unchanged. See Speaker Diarization for details.
- New diarize_model Parameter — Opt into v2 by passing diarize_model=v2 (pin to v2) or diarize_model=latest (recommended; auto-upgrades to future diarizer iterations) on pre-recorded /v1/listen requests. Unrecognized values return 400 Bad Request. Streaming requests reject diarize_model and return 400; use diarize=true for streaming diarization. diarize=true on batch continues to route to v1 to preserve behavior for existing integrations.
- General Improvements — Keeps our software up-to-date.
- May 14, 2026
- Date parsed from source:May 14, 2026
- First seen by Releasebot:May 16, 2026
May 14, 2026
Deepgram adds profanity filtering in 50+ languages to automatically redact offensive language in transcripts.
Profanity Filtering Now Available in 50+ Languages
We’re excited to announce the release of profanity filtering support for over 50 monolingual languages. Deepgram’s profanity filter automatically detects and redacts offensive language in transcripts, helping you produce cleaner and safer content across a wide range of languages.
How to Use Profanity Filtering
To enable profanity filtering, add the profanity_filter=true parameter to your Deepgram API request:
For more details, supported languages, and additional options, visit the Profanity Filter page.
Original source - May 14, 2026
- Date parsed from source:May 14, 2026
- First seen by Releasebot:May 15, 2026
Deepgram’s Nova-3 Expands Speech-to-Text Support Across Asia-Pacific
Deepgram adds broader Nova-3 speech-to-text support across Asia-Pacific, with Thai, Cantonese, Mandarin Simplified and Traditional, and Gujarati now supported, plus improved accuracy for Bengali, Marathi, Tamil, and Telugu in streaming and batch transcription.
Nova-3 Expands Speech-to-Text Support for Thai, Cantonese, Mandarin, and Indic Languages
Extending Nova-3 across Asia-Pacific with new support for Thai and Chinese language variants, improved speech recognition accuracy for Bengali, Marathi, Tamil, and Telugu, and new support for Gujarati.
Expanding Nova-3 Across Asia-Pacific
Deepgram is continuing to expand Nova-3 language coverage across Asia-Pacific, bringing production-ready speech-to-text transcription to more languages, dialects, and regional speech patterns. Nova-3 now supports Thai, Cantonese Traditional, Mandarin Simplified, and Mandarin Traditional, while also delivering improved speech recognition accuracy for Bengali, Marathi, Tamil, and Telugu. We’ve also added Gujarati as a newly supported language on Nova-3.
These additions expand Nova-3 in regions shaped by tonal speech, multiple writing systems, regional pronunciation variation, and complex linguistic structures that have historically challenged traditional speech-to-text systems. Nova-3 continues to improve transcription quality in both batch and streaming use cases, while preserving the low latency and production-ready performance required for voice AI applications.
New Thai and Chinese Language Support on Nova-3
Thai Speech-to-Text (th, th-TH)
Thai is spoken in Thailand and widely used in customer support, commerce, media, and conversational applications throughout Southeast Asia. As a tonal language, meaning changes based on pitch contour and pronunciation, creating challenges for generalized speech recognition systems.
Chinese Language Variants
Cantonese Traditional Speech-to-Text (zh-HK)Cantonese is widely spoken throughout Hong Kong and by global Cantonese-speaking diaspora communities. Its extensive tonal variation, fast conversational pacing, colloquial expressions, and region-specific pronunciation patterns can be difficult for ASR systems to model accurately. Cantonese speech also frequently blends conversational shorthand and multilingual phrasing, particularly in customer support and real-time communication workflows.
Mandarin Simplified Speech-to-Text (zh, zh-CN, zh-Hans)Mandarin Simplified is used in mainland China in customer support, commerce, media, and conversational AI applications. Supporting Mandarin Simplified requires speech recognition systems to handle tonal pronunciation, regional accent variation, and fast conversational speech across large-scale real-time and transcription workflows.
Mandarin Traditional (zh-TW, zh-Hant)Mandarin Traditional is spoken throughout Taiwan, Hong Kong, and many overseas Chinese communities. Unlike Simplified Chinese, Traditional Chinese preserves the original, more complex character forms and is commonly used in regional media, education, government, finance, and enterprise communication. Mandarin Traditional has distinct written forms and regional usage patterns that can introduce complexity for speech recognition systems.
Benchmarking: Relative Word Error Rate (WER) Reduction vs Nova-2
Thai, Cantonese Traditional, Mandarin Simplified, and Mandarin Traditional are now available on Nova-3, bringing improved transcription quality across both streaming and batch workflows compared to Nova-2. The following benchmark results show relative Word Error Rate (WER) reductions compared to Nova-2 across newly supported Thai and Chinese language variants.
Key highlights
- Thai streaming transcription achieves a 69.43% relative WER reduction compared to Nova-2
- Mandarin Simplified batch transcription achieves a 65.21% relative WER reduction compared to Nova-2
- Cantonese Traditional achieves a 24.82% relative WER reduction, while Mandarin Traditional achieves a 44.87% relative WER reduction across batch workflows
Improved Speech Recognition Accuracy Across Indic Languages
In addition to new Thai and Chinese language support, Nova-3 has improved speech recognition accuracy across several Indic languages that were released earlier this year.
- Bengali (bn)
- Marathi (mr)
- Tamil (ta)
- Telugu (te)
These updates improve transcription quality in both streaming and batch workflows, helping developers build more reliable voice applications across South Asia.
Indic languages span multiple language families, scripts, and phonetic structures, often with significant regional variation and conversational speech patterns. Improving recognition quality across these languages supports customer support, conversational AI, transcription, and analytics workflows operating across diverse regional speech environments.
We’ve also added new support for Gujarati (gu, gu-IN) on Nova-3, further expanding Indic language coverage across India and global Gujarati-speaking diaspora communities.
Built for Developers and Enterprises
All languages included in this release are available through the same API developers already use today. You can use Nova-3 for both streaming and batch transcription workflows without retraining or custom configuration.
Switching to any supported language is as simple as updating the language parameter in your request:
curl --request POST \ --header "Authorization: Token YOUR_DEEPGRAM_API_KEY" \ --header "Content-Type: audio/wav" \ --data-binary @youraudio.wav \ "https://api.deepgram.com/v1/listen?model=nova-3&language=zh-HK"Supported language codes:
- Thai: th, th-TH
- Cantonese Traditional: zh-HK
- Mandarin Simplified: zh, zh-CN, zh-Hans
- Mandarin Traditional: zh-TW, zh-Hant
- Bengali: bn
- Marathi: mr
- Tamil: ta
- Telugu: te
- Gujarati: gu, gu-IN
Build Globally with Deepgram and Unlock Enterprise-Grade Voice AI Today
Sign up free and unlock $200 in credits, enough to power over 750 hours of transcription or 200 hours of speech-to-text across Nova-3’s growing language suite. Explore details on our Models & Languages Overview page and experience Nova-3’s world-class adaptability for yourself.
Original source - May 13, 2026
- Date parsed from source:May 13, 2026
- First seen by Releasebot:May 15, 2026
Your restaurant needs to speak Spanish, y ahora puede.
Deepgram launches Flux Multilingual for Restaurants, bringing real-time code-switching Voice AI to fast, accurate multilingual ordering across 10 languages with the same API and latency profile as its English restaurant model.
Most customers are multilingual – your restaurant should be, too
Deepgram Flux Multilingual helps restaurants deliver faster, more accurate multilingual ordering with real-time code-switching Voice AI across 10 languages. Check it out!
A growing share of your customers are placing orders in their native language, or trying to. Take drive-thru as an example: Hispanic Americans are more likely than non-Hispanic Americans to dine at fast food restaurants regularly [1].
In a diversifying world, brands that force a customer out of their comfort zone to order in a language that isn't theirs are going to fall behind. A hit to your customer satisfaction and brand trust, incorrect orders, too-long speed of service. At the very least, they’re walking away with a less-than-optimal taste in their mouth, and you’ve missed out on an opportunity to delight.
If you’re ahead of the curve, you’ve already looked at Voice AI as a solution to this problem, but multilingual Voice AI is a challenge that few platforms have conquered.
Multilingual Voice AI has been begging for reinvention
Most "multilingual" systems are stitched together. Language detection runs first, hands off to a monolingual model for the detected language, and a translation layer mediates between them. The result is high latency, brittle handoffs, and accuracy that falls apart the moment a customer says "and para mi hijo, una hamburguesa con queso y fries." Multilingual customers don't speak in clean monolingual blocks. You need a Voice AI that can accommodate switches mid-sentence, mid-word, mid-thought.
Today we're announcing Flux Multilingual for Restaurants, the version of our restaurant Voice AI that handles natural code-switching between languages without a perceptible latency hit. English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch: ten languages, one model, one API, monolingual-grade accuracy, built for real-time streaming voice agents.
Restaurant audio is hard, and Flux Multilingual was built to solve it
Restaurant tech is one of the harder Voice AI environments to build in. You have wind, engines, kitchen noise, multiple drive-thru systems next to each other, the nearby highway, music and background conversation in the car, accents that vary by region, customers who don't want to repeat themselves, and a POS that wasn't designed to talk to an AI in any language. Adding multilingual support without a clear strategy makes everything slower.
We built Flux Multilingual on the same research stack that runs our English restaurant Voice AI product, which is in production at brands you know and order from. The infrastructure, reliability target, and price-per-conversation are the same as the English-only product. Adding multilingual coverage doesn't double your cost or latency.
If you're a CTO or engineering lead deciding whether multilingual Voice AI is a real moat or a roadmap checkbox: it's the moat. Customers who can order in their first language come back, and the ones who can't, churn quietly to the chain on the next exit.
The voice model is the easiest part
Already using Flux English? Change your model to flux-general-multi. Same API, same integration – just change one parameter, and gain access to English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch.
If you're not, trust us – onboarding the multilingual model is the easy part. The hard part is something only Deepgram for Restaurants can do: injecting voice AI into their veins of your operation, making it work with POS systems, menu data, and tricky deployments. And we’re up for the challenge.
Reach out – hablemos – let's talk.
Start building today:
- See the live demo →
- Try it in the Playground →
- Get started with the API →
- Build end-to-end with the Voice Agent API →
- Sign up for an API key →
- Explore the documentation →
- Join our Discord community →
[1] "Key QSR Insights Among U.S. Hispanic Adults." CivicScience, civicscience.com/key-qsr-insights-among-u-s-hispanic-adults/
Original source - May 13, 2026
- Date parsed from source:May 13, 2026
- First seen by Releasebot:May 14, 2026
May 13, 2026
Deepgram releases improved batch speaker diarization with a new diarize_model parameter, rolling out v2 as an opt-in upgrade that boosts accuracy, especially on contact-center audio. It keeps existing diarize=true users on v1 and adds latest auto-upgrades with no pricing change.
Diarization v2: Improved Batch Speaker Diarization
A new batch diarization model is available today via the diarize_model API parameter.
Deepgram is rolling out v2 of our batch speaker diarization model. v2 is a new architecture available today on an opt-in basis through the new diarize_model parameter. In side-by-side human evaluation, v2 was preferred 3.3× over our current production diarizer (v1), with the largest gains on contact-center audio — median CER reduced roughly 80% compared to the prior version of the diarization model. Customers using diarize=true are unaffected.
Key Features
- New diarize_model parameter — A single parameter that both enables diarization and selects the version. Most customers should choose latest; v2 or v1 are also accepted.
- diarize_model=latest auto-upgrades — Resolves to the newest GA diarizer. Today that’s v2.
- No breaking changes — diarize=true continues to route to v1.
- Compatible with the rest of the platform — Works with Nova-1, Nova-2, Nova-3, enhanced, and base batch models (async and sync), monolingual and multilingual, alongside existing batch features.
New diarize_model parameter
The new diarize_model parameter enables diarization and selects the model version in a single parameter — no need to also set diarize=true:
https://api.deepgram.com/v1/listen?model=nova-3&diarize_model=latest
Migration guidance
- New integrations: For new projects we recommend diarize_model=latest. To pin a specific version, use diarize_model=v2 or diarize_model=v1.
- Existing diarize=true users: No breaking changes — your existing requests continue to work with v1. To pick up v2’s improvements, update your requests to diarize_model=latest (always newest) or diarize_model=v2. We recommend testing on a representative sample of your audio before flipping production traffic.
No pricing changes. Diarization continues to be included at current rates.
Availability
- Available now on the /v1/listen endpoint, on both US-hosted and EU-hosted endpoints
- Supported on Nova-1, Nova-2, Nova-3, enhanced, and base batch models (async and sync), monolingual and multilingual
- Streaming: diarize_model is not accepted on streaming requests and returns 400. Use diarize=true for streaming diarization. Streaming improvements ship separately.
- Self-hosted support coming soon
Learn more in the Speaker Diarization documentation.
Original source - May 13, 2026
- Date parsed from source:May 13, 2026
- First seen by Releasebot:May 14, 2026
Put a Deepgram Voice Agent on Any Web App in Minutes
Deepgram introduces the Browser Agent SDK, a four-layer npm toolkit that brings voice agents to any web app with a fast path from a simple widget to full framework-agnostic control. It adds shared reconnection, audio handling, KeepAlive pings, and token-safe browser deployment.
The Browser Agent SDK is four composable npm packages that drop a Deepgram voice agent into any web app, with a clean path from one-line widget to full framework-agnostic control.
The web is now a primary surface for voice agents. Support portals, scheduling tools, internal copilots, marketing sites, in-product help. And yet, getting a voice agent onto a web page has been the slowest part of building one.
You spin up an agent on the Voice Agent in an afternoon. Then you spend a sprint on the browser side: mic capture, audio worklets, playback queues, reconnection, KeepAlive, token rotation without leaking your API key, and a UI that doesn't look like a 2014 chat widget. By the time the agent speaks on the page, you've shipped a small audio pipeline you didn't really want to own.
Today we're shipping the Browser Agent SDK: four composable npm packages that drop a Deepgram voice agent into any web app, with a clean path from one-line widget to full framework-agnostic control.
Four Layers, One Install Away
Each layer builds on the one below it. Install the highest layer you need and the rest comes with it.
- @deepgram/agents-widget: drop-in widget with six layouts (sidebar, floating, inline, button, embedded, or orb). No framework required.
- @deepgram/ui: pre-built React components (conversation view, animated orb, mic and speaker controls, waveform visualizer). Themed through CSS custom properties so your design system stays in charge.
- @deepgram/react: AgentProvider and hooks for state, conversation history, microphone control, audio playback, and client-side function calling scoped to component lifecycle.
- @deepgram/agents: the framework-agnostic core. WebSocket client, microphone capture, and player. Use it with Vue, Svelte, Angular, or vanilla JavaScript.
All four layers share the same reconnection logic, playback-aware mode tracking, audio buffering, optional Silero VAD, KeepAlive pings, and typed event emitter. The hard parts of browser voice are handled once and inherited everywhere.
Ship in Minutes
npm install @deepgram/agents-widgetimport { init } from "@deepgram/agents-widget"; init({ tokenFactory: () => fetch("/api/deepgram-token").then(r => r.text()), agent: "YOUR_AGENT_ID", layout: "floating", });That's the whole client. Your server endpoint mints a short-lived token using the Deepgram auth grant, the SDK calls it on every connect and reconnect, and your API key never touches the browser. The token rides the Sec-WebSocket-Protocol header because that's the only custom header browsers permit on WebSocket handshakes.
Reference a pre-configured Agent ID from Deepgram Console, define listen/think/speak inline in code, or combine both. Console changes propagate to the agent without a redeploy.
What You Can Build
- Drop a voice helper into a marketing or docs site. Use the widget in floating or button layout. Five-minute setup, no React required.
- Embed a voice copilot in a React product. Use @deepgram/react for state and the @deepgram/ui components for the UI. Theme it with your existing CSS variables. Scope function calls to the component lifecycle so they're cleaned up when the user navigates away.
- Build a custom voice experience in Vue, Svelte, or vanilla JS. Use @deepgram/agents directly. You get AgentSession, AgentMicrophone, and AgentPlayer and bring your own UI. Same reconnection and buffering as every other layer.
Production-Grade Audio by Default
Most of the bugs in browser voice come from the same places: a reconnection storm that ddos's your token endpoint, audio frames dropped before the server is ready, the agent "thinking it finished talking" while the user is still hearing tail audio, idle WebSockets that get killed by a proxy after 60 seconds.
The Browser Agent SDK handles these out of the box. Reconnection uses exponential backoff with jitter and configurable ceilings. Microphone frames captured before the server's SettingsApplied are queued and flushed. The SDK switches from speaking to listening only after the audio queue actually drains in the browser, so the agent does not interrupt its own tail audio. KeepAlive heartbeats prevent idle disconnects.
Visualizations (orb and waveform) use Canvas 2D, not WebGL, so they work on low-power devices without a GPU.
Ship in Minutes, Customize for Months
You can start with the widget today and graduate to the React layer when you need custom UI, or drop to the framework-agnostic core when you outgrow React. You don't have to change vendors as you grow. Same connection logic, same audio defaults, same agent.
Original source - May 13, 2026
- Date parsed from source:May 13, 2026
- First seen by Releasebot:May 13, 2026
May 13, 2026
Deepgram improves Nova-3 Portuguese transcription accuracy across Brazilian and European Portuguese variants.
Nova-3 Portuguese Model Update
Improved Nova-3 Portuguese Model
We’ve enhanced the Nova-3 Portuguese model with improved transcription accuracy across Portuguese language variants, including Brazilian Portuguese (pt-BR) and European Portuguese (pt-PT).
To use the updated model, set
model="nova-3"and use one of the supported Portuguese language codes:language="pt"language="pt-BR"language="pt-PT"
Learn more about Nova-3 and supported languages on the Models and Language Overview page.
Original source - May 12, 2026
- Date parsed from source:May 12, 2026
- First seen by Releasebot:May 12, 2026
May 12, 2026
Deepgram releases SDK updates across JavaScript, Rust, Python, and Java, adding Flux multilingual support, restoring the Agent interface, fixing WebSocket query handling, and improving reconnect behavior with new transport customization options and breaking API updates.
SDK releases
A new round of SDK updates is now available across JavaScript, Rust, Python, and Java. This release brings Flux multilingual support to Rust, restores the Agent interface in JavaScript, ships a Python bugfix for WebSocket query parameters, and delivers a breaking Java release with reconnect improvements.
JavaScript SDK v5.2.0
Deepgram JavaScript SDK v5.2.0 is now available. This release restores the Agent interface and adds AgentReference for string-ID flows, aliases AgentV1SettingsAgentListenProvider to AgentContextListenProvider, and preserves AgentV1Settings.Agent sub-types so existing agent code continues to compile.
For release details, see deepgram-js-sdk v5.2.0.
Rust SDK 0.10.0
Deepgram Rust SDK 0.10.0 is now available. This release adds Flux multilingual support with Model::FluxGeneralMulti, OptionsBuilder::language_hint for BCP-47 language hints, and new TurnInfo fields (languages and languages_hinted). It also introduces mid-session reconfiguration via FluxHandle::configure(ConfigureRequest) for adjusting thresholds, keyterms, and language hints without restarting the WebSocket.
This release includes a breaking change: FluxResponse::TurnInfo is now #[non_exhaustive].
For release details, see deepgram-rust-sdk 0.10.0.
Python SDK v7.1.1
Deepgram Python SDK v7.1.1 is now available. This patch release fixes boolean query parameters on WebSocket connect, which are now lowercased to match what the API expects.
For release details, see deepgram-python-sdk v7.1.1.
Java SDK v0.4.0
Deepgram Java SDK v0.4.0 is now available. This release ships reconnect and listener bug fixes, adds a transport factory policy hook for customizing transport behavior (timeouts, proxies, TLS) without subclassing the client, and incorporates the latest API surface updates.
This release includes breaking changes. For the full release notes, see deepgram-java-sdk v0.4.0.
Original source - May 12, 2026
- Date parsed from source:May 12, 2026
- First seen by Releasebot:May 12, 2026
May 12, 2026
Deepgram expands Nova-3 multilingual numerals support, converting spoken numbers to digits across more languages.
Nova-3 Multilingual Model Update
Numerals Support Expanded for Nova-3 Multilingual
Numeral formatting is now supported for all Nova-3 multilingual languages — except Hindi and Japanese. This enhancement means Nova-3 multilingual can now convert spoken numbers to digits (e.g., “three hundred” → “300”) for English, Spanish, French, German, Russian, Portuguese, Italian, and Dutch.
To use this feature, set
model="nova-3"andlanguage="multi". Then include thenumerals=trueparameter in your request.Learn more about how Numerals works and see supported languages on the Numerals page.
Original source - May 11, 2026
- Date parsed from source:May 11, 2026
- First seen by Releasebot:May 12, 2026
- Modified by Releasebot:May 12, 2026
May 11, 2026
Deepgram releases the Browser Agent SDK, a new set of composable packages that connect web apps to the Voice Agent API, plus a simpler docs structure for Voice Agent. The SDK includes a drop-in widget, React components, a provider and hooks, and a framework-agnostic core.
Browser Agent SDK
The Browser Agent SDK is now available — four composable packages that connect any web app to the Voice Agent API:
@deepgram/agents-widget — drop-in widget with six layouts (sidebar, floating, inline, button, embedded, or orb). No framework required.
@deepgram/ui — pre-built React components (conversation view, animated orb, mic/speaker controls, waveform visualizer) styled through CSS custom properties.
@deepgram/react — provider + hooks for state, conversation history, microphone control, audio playback, and client-side function calling.
@deepgram/agents — the framework-agnostic core: WebSocket client, microphone capture, and player.
Each layer builds on the one below it, so installing the higher layer pulls in everything beneath. All layers share the same reconnection logic, playback-aware mode tracking, audio buffering, optional Silero VAD, KeepAlive pings, and typed event emitter.
Install the widget and ship in minutes:
For the full architecture, package-by-package guides, and live in-page demos, see the Browser Agent SDK overview.
Voice Agent docs restructure
The Voice Agent section has been reorganized into five sections — Get Started, Build, Integrate, Reference, and Tips & Migration — to make it easier to find content based on where you are in your build. As part of the same pass, a few closely related reference pages have been merged (for example, prompt-updated, speak-updated, and think-updated are now consolidated into Acknowledgements, and the errors and warning pages are now Errors & Warnings). Redirects are in place, so existing links continue to work.
Original source - May 5, 2026
- Date parsed from source:May 5, 2026
- First seen by Releasebot:May 6, 2026
Build Voice Agents in Your AI Coding Tool
Deepgram ships new voice AI developer tooling with the dg CLI, MCP server, and deepgram/skills repo, making its APIs easier to use in Claude Code, Cursor, Windsurf, Codex, and Aider. It speeds up voice agent setup, testing, and integration from the terminal.
Build voice agents faster with the dg CLI, MCP server, and the deepgram/skills repo.
Three agentic engineering tools that make Deepgram a first-class citizen in Claude Code, Cursor, Windsurf, Codex, and Aider.
Voice AI builders have a new default development environment, and it isn't a browser tab. It's an AI coding tool. Claude Code, Cursor, Windsurf, Codex, Aider. The agent reads your repo, writes the integration, runs the tests, and ships the PR. The bottleneck is no longer typing speed. It's how well your AI coding tool understands the APIs you're trying to use.
Most voice AI builders hit the same wall. The agent gets to the speech layer and stalls. It guesses at endpoint shapes. It hallucinates parameters. It writes against a curl example from two model versions ago. You end up pasting docs into the prompt, copy-pasting from the dashboard, or writing scaffolding by hand and letting the agent fill in the rest. Every voice agent integration eats more developer time and more agent tokens than it should at the part of the stack that should be the easiest.
We fixed that.
Three Tools Shipped Together
In April we shipped three pieces of agentic engineering tooling that work together as one platform layer for voice AI builders.
The dg CLI.
A terminal interface for Deepgram with 25+ commands. Transcribe a file, a URL, a microphone, or a piped audio stream. Generate speech with Aura. Run text intelligence on a transcript. Manage projects, keys, members, and usage. Auto-detects Claude Code, Aider, and Codex and switches to JSON output and stderr-routed status without flags. UNIX-friendly by design, with structured stdout, proper exit codes, and pipe support. MIT license, Python 3.10+. Install at cli.deepgram.com.
The MCP server.
A built-in Model Context Protocol proxy that connects your AI coding tool to Deepgram's API. Start it with dg mcp. One tool surface, with the agent able to transcribe audio, generate speech, list models, manage projects, and more. Auth handled locally via dg login. Plugs into Claude Code, Cursor, Windsurf, or any MCP-aware tool.
The deepgram/skills repo.
Agent skills are markdown instruction folders that your AI coding tool loads on demand. Six product-level skills cover API reference (api), docs navigation (docs), runnable starter apps (starters), feature-specific recipes (recipes), third-party integrations (examples), and MCP setup (setup-mcp). Per-language SDK skills layer on top for Python, JavaScript/TypeScript, Java, Go, Rust, Swift, Kotlin, .NET, and browser TS. The CLI handles the core install for you, and a single command brings in the rest (more below).
What You Can Build With Them
A voice agent prototype before lunch.
Pull a starter app with a skill. Wire it to your LLM. Pipe a test audio file through the CLI to confirm transcription. Generate speech for the agent's response. Iterate without leaving the terminal or your AI coding tool's context window.An integration that doesn't drift.
Skills update with the product. Your AI coding tool reads the current API surface, not a stale model-trained guess. The recipes it pulls are real recipes, not hallucinations.A multi-language stack on one platform.
The product-contract skills tell the agent what Deepgram does. The SDK skills tell it how to call Deepgram in your language. Same platform, same primitives, different language idioms.A faster eval-to-prototype loop.
Transcribe a sample, generate speech, inspect a project, all from the same shell. The CLI is also useful when you want to test a hypothesis quickly without writing app code.How It Works
Install the CLI:
curl -fsSL deepgram.com/install.sh | shLog in. The CLI detects which AI coding tools you have installed (Claude Code, Codex, Gemini CLI, Cursor, Cline) and offers to install the four core Deepgram skills (api, docs, starters, setup-mcp) into each.
dg loginFor the full skill set, including recipes and integration examples, use the universal installer:
npx skills add deepgram/skillsOr for Claude Code natively, register the plugin marketplace:
/plugin marketplace add deepgram/skills /plugin install deepgram@deepgram-agent-skillsIf you skip the dg login prompt or add a new AI coding tool later, run dg skills install to set up the core skills on demand.
Transcribe a file:
dg listen call.mp3 | jq '.results.channels[0].alternatives[0].transcript'Generate speech to your speaker:
dg speak "Hello from Deepgram" | ffplay -nodisp -autoexit -Start the MCP server for your AI coding tool:
dg mcpYour AI coding tool now has structured Deepgram knowledge loaded as skills, can pull the right starter for a given use case, and can call the API directly via MCP when you want it to.
Get Started
Install the CLI →
Get the skills →
CLI README and reference →
The story for voice AI builders is no longer "Deepgram is an API you integrate." It's "Deepgram is a platform your AI coding tools already understand." That's what changes when the bottleneck moves from typing to context. We're going to keep building toward this. Skills are easy to extend, MCP capabilities are growing, and the CLI is going to get better the more we hear from builders. Tell us what you want next.
Original source - Apr 30, 2026
- Date parsed from source:Apr 30, 2026
- First seen by Releasebot:May 1, 2026
- Modified by Releasebot:May 2, 2026
April 30, 2026
Deepgram releases its April 2026 self-hosted update with Nova-3 Gujarati support, Aura-2 speed and pronunciation controls, and stronger Voice Agent capabilities. It also improves numeral formatting, multilingual tagging, and redaction accuracy.
Deepgram Self-Hosted April 2026 Release (260430)
Container Images (release 260430)
quay.io/deepgram/self-hosted-api:release-260430
Equivalent image to:
quay.io/deepgram/self-hosted-api:1.185.0-2
quay.io/deepgram/self-hosted-engine:release-260430
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.116.0-1
Minimum required NVIDIA driver version: >=570.172.08
quay.io/deepgram/self-hosted-license-proxy:release-260430
Equivalent image to:
quay.io/deepgram/self-hosted-license-proxy:1.10.1
quay.io/deepgram/self-hosted-billing:release-260430
Equivalent image to:
quay.io/deepgram/self-hosted-billing:1.13.0
Aura-2 Speed and Pronunciation Controls require an updated voice-pack
The new Aura-2 Speed and Pronunciation Control features in this release are powered by an updated Aura-2 English voice-pack model. If your deployment is using an Aura-2 English voice-pack from before the April 2026 release (e.g., the 2025-04-15.0 version of the voice-pack), requests including the speed or pronounce parameters will return 400 Bad Request.
To enable these features, contact your Deepgram representative to obtain the latest Aura-2 English voice-pack (2025-04-15.4 or later) and replace the existing voice-pack file in your models directory. The official Deepgram Helm chart and sample values files in deepgram/self-hosted-resources (chart 0.34.0 and later) already point to the correct UUID; you only need to use the latest Deepgram configuration files and update the model file on disk.
This Release Contains The Following Changes
- Nova-3 Gujarati — Nova-3 now supports Gujarati (gu) for both batch and streaming.
- Aura-2 Speed and Pronunciation Controls — Aura-2 TTS voices now support runtime speed and pronunciation control. See Voice Controls for details.
- Improved Aura-2 Pronunciation — Better pronunciation for Spanish dates and the term “Jan” (as a name versus a month) with Aura-2 voices.
- Nova-3 Multilingual Numeral Formatting — Numeral formatting is now applied when using Nova-3 multilingual models and smart_format or numerals is enabled.
- Numeral Formatting for Hebrew and Romanian — Numeral formatting is now applied for Nova-3 Hebrew (he) and Romanian (ro) when smart_format or numerals is enabled.
- Voice Agent: Cartesia Speed Control — The Cartesia speak provider now supports speed control in Voice Agent sessions.
- Voice Agent: Improved Agent Message Injection — Improved support for injecting agent messages into a live session. See Inject Agent for details.
- Voice Agent: Multilingual Flux Language Hints — Multilingual Flux now accepts language hints when used as the STT provider in a Voice Agent session.
- Improved Multilingual Streaming Language Tags — Improves the accuracy of language tag results on /v1/listen streaming requests using multilingual models.
- Improved Numeral Redaction Accuracy — Improved redaction accuracy when using redact=numbers or redact=aggressive_numbers.
- General Improvements — Keeps our software up-to-date.
Curated by the Releasebot team
Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.
Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.
Similar to Deepgram with recent updates:
- Anthropic release notes574 release notes · Latest May 22, 2026
- Eleven Labs release notes62 release notes · Latest May 18, 2026
- Deepseek release notes18 release notes · Latest Apr 24, 2026
- xAI release notes74 release notes · Latest May 21, 2026
- Cursor release notes84 release notes · Latest May 20, 2026
- Perplexity release notes24 release notes · Latest May 11, 2026