Deepgram Release Notes

Last updated: Dec 23, 2025

  • Dec 16, 2025
    • Parsed from source:
      Dec 16, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Deepgram logo

    Deepgram

    Deepgram Expands Nova-3 with 10 New Languages and Multilingual Keyterm Prompting

    Nova-3 expands with 10 new monolingual languages and a major Multilingual upgrade, boosting enterprise ASR accuracy and real-time performance across diverse scripts and tones. New Multilingual Keyterm Prompting lets you inject up to 500 tokens to boost domain-specific recognition without retraining.

    Overview

    Deepgram is expanding Nova-3 with support for 10 new monolingual languages and a major upgrade to Nova-3 Multilingual. This release further solidifies Nova-3 as a leading enterprise ASR model, offering high accuracy, adaptability, and precise performance across diverse languages and speech patterns.

    Deepgram continues to expand the reach of Nova-3 with support for 10 additional monolingual languages and a major update to Nova-3 Multilingual. This release strengthens Nova-3’s position as one of the most advanced enterprise ASR models available today, delivering accuracy, adaptability, and linguistic precision across diverse language families, scripts, and speech behaviors.

    Built for Global Speech Diversity

    This update brings Nova-3 into regions that challenge traditional ASR systems: languages with tonal variation, morphological complexity, and multi-script writing systems. Nova-3 handles these differences natively, preserving low latency and enterprise-grade accuracy across both batch and streaming modes.

    Nova-3 now supports 10 new monolingual languages across Southern Europe, the Baltics, and Southeast Asia, along with a major upgrade to multilingual accuracy through Keyterm Prompting.

    10 New Languages Now Live in Nova-3

    Earlier Nova-3 expansions focused on widely spoken European and Asian languages. This update represents the next phase, expanding into languages with distinct phonetic structures, scripts, and grammatical systems.

    Southern and Eastern Europe

    • Greek (el)
      Characterized by inflectional morphology and variable word stress. Nova-3 improves modeling of vowel alternations and compound forms.

    • Romanian (ro)
      A Romance language with Slavic influence and strong case inflection. Nova-3 delivers better handling of endings, stress patterns, and mid-word vowel shifts.

    • Slovak (sk)
      Complex consonant clusters and rich case systems make Slovak challenging for general ASR. Nova-3 improves recognition of grammatical gender and declension patterns.

    • Catalan (ca)
      A hybrid between Spanish and French with vowel reduction and multiple dialects. Nova-3 strengthens recognition in conversational and broadcast speech.

    Northern and Baltic Europe

    • Lithuanian (lt)
      A Baltic language with free stress and pitch accent. Nova-3 improves accuracy for rich morphology and long compounds.

    • Latvian (lv)
      Features vowel length contrast and consonant palatalization. Nova-3 increases clarity and keyword recall at varied speaking speeds.

    • Estonian (et)
      Combines vowel harmony with a three-length quantity system. Nova-3 improves segmentation and prosodic modeling in real-time scenarios.

    • Flemish (nl-BE)
      The Belgian variant of Dutch with regional phonetic shifts. Nova-3 enhances accuracy for colloquial and broadcast environments.

    • Swiss German (de-CH)
      A regional variant with extensive dialectal diversity. Nova-3 adapts more effectively to high-variance speech patterns.

    Southeast Asia

    • Malay (ms)
      Combines Austronesian roots with English and Arabic loanwords. Nova-3 improves accuracy in multilingual settings and conversational audio.

    Benchmarking: Accuracy Gains Across Languages

    Nova-3 continues to deliver measurable accuracy improvements over Nova-2, reducing Word Error Rate (WER) across both batch and streaming transcription. These gains hold across languages that vary widely in morphology, phonetics, and script complexity.

    A clear trend continues to emerge: streaming transcription often achieves the strongest relative WER reductions, reinforcing Nova-3’s suitability for real-time applications such as voice agents, live captioning, and AI telephony systems.

    Word Error Rate (WER) – Relative Improvement (10 New Nova-3 Languages)

    Key Highlights

    • All ten languages show accuracy gains in either batch or streaming modes, with many improving in both.
    • Malay, Romanian, and Slovak show some of the largest relative WER reductions, with improvements exceeding 20 percent in several cases.
    • Streaming models outperform batch in roughly half of the languages, supporting Nova-3’s strength in conversational and low-latency workflows.
    • Languages with complex morphology or less-standardized orthography such as Lithuanian, Latvian, and Slovak show robust gains, indicating improved handling of case systems, inflection, and compound formation.
    • Swiss German and Flemish deliver strong improvements despite dialectal variation, demonstrating Nova-3’s adaptability across regional speech patterns.

    New: Multilingual Keyterm Prompting

    Nova-3 Multilingual now supports Multilingual Keyterm Prompting, allowing developers to pass up to 500 tokens (about 100 words) to improve recognition of brand names, technical terminology, and domain-specific vocabulary across multilingual audio.

    Nova-3 can now prioritize these terms across all supported languages in a single request. This is especially valuable for global enterprises in finance, healthcare, retail, and customer support.

    No retraining is required. Nova-3 adapts instantly when you provide a list of key terms.

    Why It Matters

    Nova-3 continues to evolve as a unified speech recognition foundation for global products and workflows. Instead of applying one pattern to every language, Nova-3 adapts to each language’s structure whether it involves tones, inflections, or non-Latin alphabets.

    For developers and enterprise teams, this means:

    • Consistent performance across diverse global markets
    • Improved recognition in both conversational and formal speech
    • Lower latency and fewer transcription errors in multilingual environments
    • Flexible customization with Keyterm Prompting for domain-specific accuracy

    Getting Started

    Switching to any of the newly supported languages is simple. Update your API request with the appropriate language code:

    Supported language codes:

    el, lt, lv, ms, sk, ca, et, nl-BE, de-CH, ro
    

    To use multilingual Keyterm Prompting, pass your list of key terms through the keyterms parameter in your Nova-3 Multilingual request.

    Looking Ahead

    With 10 new languages and multilingual Keyterm Prompting now live, Nova-3 continues its progress toward full global coverage. Accuracy, adaptability, and real-time reliability continue to improve across language families, scripts, and acoustic environments.

    The goal is clear: voice AI that works everywhere, for everyone. Accurate in fast speech, resilient in noisy environments, and adaptable to local dialects and cultural context.

    Unlock Enterprise-Grade Voice AI Today

    Sign up free and unlock $200 in credits, enough to power over 750 hours of transcription or 200 hours of speech-to-text across Nova-3’s growing language suite. Explore details on our Models & Languages Overview page and experience Nova-3’s world-class adaptability for yourself.

    Original source Report a problem
  • Dec 15, 2025
    • Parsed from source:
      Dec 15, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Deepgram logo

    Deepgram

    Aura-2 (TTS) Now Speaks Dutch, French, German, Italian, and Japanese

    Deepgram expands Aura-2 TTS with five new languages, delivering business‑grade, low‑latency voice across Dutch, French, German, Italian, and Japanese. The unified global TTS infra promises accurate pronunciation and real‑time performance for multilingual workflows.

    Deepgram expands its high-precision text-to-speech API to bring natural, business-ready voice infrastructure to global markets.

    Text-to-speech has become a cornerstone of voice AI, powering everything from scheduling assistants to customer support agents and multilingual automation. Aura-2 delivers the natural and realistic speech users expect, but it is also engineered for the rigorous demands of business use cases. Beyond human-like intonation, it prioritizes clarity, precision, and responsiveness. These qualities are essential for real-time voice agents where accuracy and speed are just as critical as voice quality.

    Today, we are excited to announce that Aura-2 now supports five additional languages:
    🇳🇱 Dutch 🇫🇷 French 🇩🇪 German 🇮🇹 Italian 🇯🇵 Japanese
    These join our existing English and Spanish models to provide a robust infrastructure for global voice applications.

    This expansion helps developers deliver consistent, multilingual voice experiences via our API without sacrificing naturalness, pronunciation accuracy, or low-latency performance.

    Why These Languages Matter for TTS

    Each new Aura-2 language inherits unique phonological and prosodic challenges that make high-quality TTS difficult. However, getting these details right is critical for business applications where clarity prevents costly errors.
    Here is why these languages are impactful additions:

    🇳🇱 Dutch: Vowel richness and compound-heavy words
    Dutch has long vowels, diphthongs, and ultra-long compound nouns. Natural TTS must handle stress placement and smooth glides between complex vowel forms while keeping numeric data clear.

    🇫🇷 French: Liaison, elision, and continuous flow
    French uses fluid connected speech (liaisons) and dropped sounds (elisions). This requires a TTS engine to master subtle transitions without sounding choppy, especially when reading strings of numbers like contact information.

    🇩🇪 German: Precision with long compounds and consonant clusters
    German’s long words, clustered consonants, and consistent stress rules demand a voice model that enunciates clearly without sounding robotic. This is vital for complex scheduling and time-based data.

    🇮🇹 Italian: Open vowels and musical intonation
    Italian TTS must preserve melody and vowel openness without exaggeration. Aura-2 maintains the natural rhythm of Italian speech even when delivering transactional updates involving currency.

    🇯🇵 Japanese: Politeness markers, pitch accent, and mixed scripts
    Japanese blends kanji, kana, and loanwords while relying heavily on pitch accent. Aura-2 ensures smooth phrasing and consistent tone. This is also a strong example of structured-speech correctness, where the model must seamlessly switch between Japanese script and alphanumeric codes.

    Unified Infrastructure for Global Scale

    Aura-2 continues to evolve as a unified voice synthesis infrastructure for global products and workflows. Instead of applying a single prosodic pattern to every language, Aura-2 adapts to each language’s unique phonology, whether it involves pitch accents, liaisons, or compound stress rules.
    For developers and enterprise teams, this means:

    • Consistent performance across diverse global markets via a single API.
    • High-precision pronunciation for structured data like IDs, currency, and times.
    • Sub-200ms latency that ensures fluid, real-time conversational flow.
    • High reliability under streaming loads with stable performance even during high-volume concurrency.
    • Simplified infrastructure that eliminates the need to stitch together different vendors for different languages.

    Getting Started

    Switching to any of the newly supported languages is simple. Update your API request with the appropriate language code.

    New language codes:
    nl, fr, de, it, ja

    For a complete list of available voices and to hear audio samples for each region, refer to the TTS Voices and Languages documentation. You can also visit the Deepgram Playground to input your own text and test performance in real-time.

    Looking Ahead

    With five new languages now live, Aura-2 continues its progress toward full global coverage. Accuracy, adaptability, and real-time reliability continue to improve across language families and acoustic environments.
    The goal is clear: voice AI that works everywhere, for everyone. We are building text-to-speech that sounds natural, responds instantly, and works globally, regardless of the complexity of the business use case.

    Unlock Enterprise-Grade Voice AI Today

    Sign up free and unlock $200 in credits, enough to generate over 13 million characters of synthesis. Explore details on our TTS Voices and Languages page and hear Aura-2’s natural, high-precision performance for yourself.

    Original source Report a problem
  • Dec 12, 2025
    • Parsed from source:
      Dec 12, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Deepgram logo

    Deepgram

    Aura-2 TTS Language Expansion

    Aura-2 (Text-to-Speech) expansion

    Deepgram has expanded Aura-2 (Text-to-Speech) to support the following languages:

    • Dutch
    • German
    • French
    • Italian
    • Japanese

    Additionally, new voices have been added to the Spanish (es) model.

    The expanded voice catalog spans genders, age groups, and speaking styles, supporting a wide range of enterprise use cases including customer service, healthcare, sales, interviews, and IVR.

    You can explore all available voices, including featured voices, in the Voices & Languages section of our documentation and try them live in the Deepgram Playground.

    Original source Report a problem
  • Dec 11, 2025
    • Parsed from source:
      Dec 11, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Deepgram logo

    Deepgram

    PHI Redaction Now Available for Batch and Streaming Speech-to-Text

    PHI redaction arrives for batch and streaming transcription, letting you redact protected health information with the phi parameter. It supports multiple entity types and languages and can be combined with other redactions. See the redaction docs for details.

    PHI redaction now available

    We’re excited to announce that PHI (Protected Health Information) redaction is now available for both batch (pre-recorded) and streaming speech-to-text.
    redact=phi
    You can now redact protected health information using the new phi parameter, which redacts the following entity types: condition, drug, injury, blood_type, medical_process, and statistics.

    Key features

    • Batch support: Available for all pre-recorded audio transcription
    • Streaming support: Available for real-time streaming transcription
    • Language support: Follows existing redaction language support (all languages for hosted batch, English only for streaming)
    • Combine with other redaction options: Use multiple redaction parameters together (e.g., redact=phi&redact=pci)

    Example usage

    $ curl \
    > --request POST \
    > --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
    > --header 'Content-Type: audio/wav' \
    > --data-binary @youraudio.wav \
    > --url 'https://api.deepgram.com/v1/listen?redact=phi'
    

    For detailed information, see our Redaction documentation and supported entity types.

    Original source Report a problem
  • Dec 10, 2025
    • Parsed from source:
      Dec 10, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Deepgram logo

    Deepgram

    Container Images Release

    Deepgram Self-Hosted December 2025 release expands Nova-3 to 31 languages and adds multilingual keyterm prompting up to 500 tokens. It also improves entity formatting and general performance, backed by updated container images and new minimum NVIDIA driver support.

    Deepgram Self-Hosted December 2025 Release (251210)

    Container Images (release 251210):

    • quay.io/deepgram/self-hosted-api:release-251210
      Equivalent image to: quay.io/deepgram/self-hosted-api:1.172.2
    • quay.io/deepgram/self-hosted-engine:release-251210
      Equivalent image to: quay.io/deepgram/self-hosted-engine:3.104.10
      Minimum required NVIDIA driver version: >=570.172.08
    • quay.io/deepgram/self-hosted-license-proxy:release-251210
      Equivalent image to: quay.io/deepgram/self-hosted-license-proxy:1.9.2
    • quay.io/deepgram/self-hosted-billing:release-251210
      Equivalent image to: quay.io/deepgram/self-hosted-billing:1.12.1

    This Release Contains The Following Changes:

    • Expands Nova-3 with 10 New Languages — Building on the 11-language expansion from the 251118 release, Nova-3 now supports 31 total languages. This release adds 10 additional languages, bringing improved accuracy and contextual understanding across:
      • Southern and Eastern Europe: Greek (el), Romanian (ro), Slovak (sk), Catalan (ca)
      • Northern and Baltic Europe: Lithuanian (lt), Latvian (lv), Estonian (et), Flemish (nl-BE), Swiss German (de-CH)
      • Southeast Asia: Malay (ms)
        Learn more in our announcement blogs: 10 new languages and previous 11-language expansion.
    • Adds Multilingual Keyterm Prompting for Nova-3 Multi — Nova-3 multilingual now supports multilingual keyterm prompting, allowing you to pass up to 500 tokens (~100 words) to boost recognition of brand names, industry jargon, proper nouns, and other mission-critical vocabulary across multilingual audio.
      This feature requires loading a newer version of the Nova-3 multilingual model. If you attempt to use keyterm prompting with an older version of the Nova-3 multilingual model, you will receive an error: Bad Request: The selected Nova-3 model does not support keyterm prompting. Contact Deepgram support for assistance with updating your model version.
      Learn more in the keyterm prompting documentation.
    • Improves Entity Formatting — Improves formatting for several entity types, including URLs and numeric entities that contain the word “thousand”.
    • Includes General Improvements — Keeps our software up-to-date.
    Original source Report a problem
  • Dec 10, 2025
    • Parsed from source:
      Dec 10, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Deepgram logo

    Deepgram

    Deepgram EU Endpoint Is Now Generally Available

    Deepgram announces EU Endpoint generally available, enabling in-EU data residency, lower latency, and full feature parity for most APIs. Production-ready EU infrastructure supports compliant, local processing with easy migration from the global endpoint.

    Earlier in 2025, we launched our EU-hosted API as part of Deepgram’s global expansion. Now, the Deepgram EU Endpoint is officially Generally Available, making it simple for teams to run their voice AI workloads fully within the EU.

    Earlier this year, we introduced our EU-hosted API as part of Deepgram’s broader global infrastructure expansion. Today, we are excited to share that the Deepgram EU Endpoint is officially Generally Available. This update makes it easier for teams to run voice AI workloads entirely within the European Union.

    This release provides:

    • Full EU data residency for compliance-driven workloads
    • Lower latency for applications serving users across Europe
    • Access to the full suite of Deepgram speech models
    • No additional pricing or activation requirements

    The endpoint is live and ready for production use.

    What GA Means for Your Applications

    Teams operating globally and serving European customers increasingly need both real-time performance and strict data locality guarantees. With general availability, the EU endpoint provides the same reliability, scale, and feature parity as our standard API (excluding Whisper models for now), and keeps all processing fully contained within the EU.

    Whether you are building contact center analytics, live voice agents, multilingual transcription, or compliant AI workflows, your voice data now remains fully processed inside EU infrastructure.

    Supported APIs

    The EU endpoint supports all major Deepgram APIs:

    • Speech-to-Text: /v1/listen and /v2/listen (excluding Whisper models)
    • Text-to-Speech: /v1/speak
    • Voice Agent: /v1/agent/converse
    • Text Intelligence: /v1/read

    This ensures full compatibility with the applications you run today.

    How to Use the EU Endpoint

    Migrating requires only one change:
    Replace:
    api.deepgram.com
    with:
    api.eu.deepgram.com
    Your existing API keys and SDK integrations work automatically. No new credentials are required.
    For detailed SDK examples and configuration steps, visit our documentation.

    Built for Compliance, Performance, and Scale

    Voice AI workloads across Europe continue to grow in every industry including finance, public services, retail, and telecommunications. Teams need infrastructure that aligns with regulatory expectations while maintaining the speed needed for real-time experiences.

    The EU endpoint provides:

    • In-region inference for reduced latency
    • Processing that remains fully inside the EU legal boundary
    • Easy migration from the global endpoint
    • Consistent model performance and API behavior

    This GA milestone builds on Deepgram’s ongoing global expansion which includes Dedicated single-tenant deployments and multi-region hosting options for enterprise environments.

    Start Building in the EU Today

    If your applications require EU data residency, lower latency, or regional infrastructure alignment, the Deepgram EU endpoint is ready for production workloads today.

    Update your base URL and begin building. There is no waitlist, no activation step, and no changes to billing or authentication.

    Use the EU endpoint: api.eu.deepgram.com
    Learn more: Configuring Custom Endpoints

    If you would like guidance on regional deployment strategy, compliance requirements, or migration planning, our team is here to help.

    Original source Report a problem
  • Dec 3, 2025
    • Parsed from source:
      Dec 3, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Deepgram logo

    Deepgram

    EU Endpoint Now Generally Available

    Deepgram launches EU region GA with api.eu.deepgram.com for EU data processing. Supports Speech-to-Text, Text-to-Speech, Voice Agent, and Text Intelligence via dedicated endpoints; switch is as simple as updating the base URL. Existing API keys work, with full config docs available.

    The Deepgram EU endpoint

    The Deepgram EU endpoint (api.eu.deepgram.com) is now generally available for customers requiring data processing within the European Union.

    Supported APIs

    • Speech-to-Text: /v1/listen and /v2/listen (excluding Whisper models)
    • Text-to-Speech: /v1/speak
    • Voice Agent: /v1/agent/converse
    • Text Intelligence: /v1/read

    Configuration

    To use the EU endpoint, simply replace api.deepgram.com with api.eu.deepgram.com in your SDK or API requests. Your existing API keys and tokens will work with the EU endpoint.

    For detailed configuration instructions and SDK examples, see our Configuring Custom Endpoints documentation.

    Original source Report a problem
  • Dec 2, 2025
    • Parsed from source:
      Dec 2, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Deepgram logo

    Deepgram

    5 Use Cases for AI Voice Agents for You and Your Business Right Now

    Nova-3 STT, real-time reasoning, and Aura-2 TTS unite in a single low-latency voice agent pipeline that can finish tasks in live customer conversations. A production guide with 10 practical use cases plus build steps and ROI KPIs for rapid shipping.

    Modern speech recognition, real-time reasoning, and natural speech synthesis now work together in one streaming loop, so voice agents can actually finish tasks, not just chat. Here's how you can take advantage of this new technology.

    The fastest path from ‘hello’ to a handled ticket: AI voice agents that listen, think, and act, all on your stack.
    Customers don’t want to press 3 for billing. Field techs can’t tap through menus with a wrench in hand. And teams don’t have time to glue together brittle speech-to-text/LLM/text-to-speech chains that fall apart under noise, accents, or interruptions.
    The gap has finally closed: modern speech recognition, real-time reasoning, and natural speech synthesis now work together in one streaming loop, so voice agents can actually finish tasks, not just chat.
    AI voice agents are now good enough to handle real customer conversations live, hands-free, and interruption-friendly. This is because speech-to-text, tool-using reasoning, and natural speech synthesis operate as a single, low-latency pipeline.
    This guide gives you 10 production-grade use cases and patterns you can ship today with a voice-agent stack (STT ↔ LLM/tool calls ↔ TTS), each tied to concrete business outcomes and a 3-step “how to build it” recipe.

    Here’s what you’ll learn in this guide for each use case:

    • Why voice (not just chat) is the right interface: hands-free workflows, barge-in, noisy channels, faster task completion, and multilingual reach.
    • Minimal architecture for each pattern: real-time STT, LLM/tool calls, and TTS (bring your own LLM if you like) plus simple hooks into CRMs, calendars, POS, ticketing, EMR, and billing.
    • Latency and UX guardrails that feel human: partials, quick confirms, read-backs, clarifying questions, and graceful barge-ins.
    • Safety and reliability basics: redaction where appropriate, human handoff, retries/backoff, observability, and KPIs that prove ROI.
    • Exactly how to try one now: open a Playground preset (or curl), place a live call, add one tool (e.g., CRM/Calendar/POS), and measure.

    How to use this guide:

    • Skim the 10 use-case cards and pick the one that matches your highest-volume call type.
    • Copy the 3-step build and the Try it now snippet.
    • Integrate one tool function (calendar, CRM, POS, ticketing).
    • Track the KPIs we list for that pattern; iterate on prompts and vocab.

    How We Chose These Use Cases
    We prioritized use cases that create immediate, measurable impact and showcase what modern voice agents do best: complete tasks in real time under messy, human conditions (noise, accents, interruptions) while calling your tools (CRM, POS, ticketing, calendars).

    Methodology: Reach × Urgency × Differentiation × Fit
    We scored each use case on four criteria:

    • Reach: Common in real-world ops: high call volumes, common workflows, or routine tasks. Voice agents should handle the most calls, not the rare edge cases.
    • Urgency: Business pressure to solve today: missed calls, long hold times, poor CX, lost revenue. If it’s costing you now, it’s worth automating now.
    • Differentiation: Voice ≫ chat or UI: fast hands-free actions, interruptibility, multilingual, noisy channel. Voice makes sense when tapping or typing fails.
    • Stack Fit: Clean tool/API surface, low data risk, straightforward compliance/handoff. These use cases shine with the right infra and STT/LLM/TTS orchestration.

    What to Expect from Each Use Case
    Every use case in this guide follows a consistent, engineering-focused format so you can understand, build, and measure it in under 5 minutes.
    Here’s the structure we use:

    1. Example Scenario: A concrete user moment you’ve likely seen before, e.g., “Reset my router,” or “I want to book a follow-up visit.”
    2. Why It’s Useful: The business outcome: shorter wait times, increased order volume, better customer retention, reduced support load, or pipeline acceleration.
    3. Why Voice AI Is Needed: What makes this use case better with voice than chat or app UI:
      • Real-time natural language understanding (NLU)
      • Interruptions and corrections (barge-in, repair)
      • Accents and code-switching
      • Contextual reasoning + tool use
      • Multilingual support
    4. How to Build It (3 Steps): Minimal architecture using a Voice Agent API stack, such as:
      • Stream audio to Nova-3 for STT (with barge-in and partials).
      • Use an LLM (Deepgram default or BYO) to reason, call tools, and output actions.
      • Send responses to Aura-2 for low-latency TTS delivery.
        We’ll also highlight where you can plug in your own models, CRMs, POS, or calendar systems.
    5. Try It Now: A link to a Playground preset, curl command, or downloadable repo that shows the use case in action with real or mock data.

    Use Case 1: Customer Support Triage & Self-Service
    Example Scenario
    A customer submits a ticket through a voice call and says, “My internet is down.” An agent verifies the account, runs scripted diagnostics, attempts a modem reset, confirms link health, and offers human escalation with the ticket prefilled.

    Why it’s Useful (Business Outcome)
    The agent deflects Tier-1 tickets and repetitive flows, reduces average handle time (AHT) via faster verification and guided steps, and improves first contact resolution (FCR)/customer satisfaction score (CSAT) by confirming actions and outcomes before the call ends. This gives consistent troubleshooting every time.

    Why Voice AI Interface is Needed

    • Barge-in and turn-taking: Real callers interrupt (“I already tried that”) without breaking flow.
    • Real-time NLU: Quickly extracts account, device, and error cues from messy speech.
    • Tool use: The agent invokes getAccount, runDiagnostics, openTicket via function calling to close the loop.
    • Robustness: Handles accents/noise better than Dual-Tone Multi-Frequency signaling (DTMF) trees, and most agents support multilingual follow-ups.

    How to Build (3 Steps)

    1. Open a session and configure: Connect to the WebSocket and immediately send a Settings message to define audio in/out, agent behavior, and providers.
    2. Add function calls: Build function calls like getAccount, runDiagnostics, openTicket so the agent can fetch customer data, follow KB steps, and file or escalate. Use the Voice Agent FunctionCallRequest/Response flow.
    3. Pick the Model Stack: Use a Deepgram-managed LLM or bring your own (OpenAI/Azure/Bedrock-style providers are supported in settings). Respond with Aura-2 TTS.

    Use Case 2: Drive-Thru Ordering (QSR)
    Example Scenario
    A customer pulls up to the speaker and says, “Can I get a double cheeseburger… actually make it a meal.” The agent parses the change mid-utterance, confirms options (size, drink), suggests an upsell (“Would you like apple pie?”), reads back the total, and pushes the ticket to POS.

    Why it’s Useful (Business Outcome)
    Drive-thru lanes rely heavily on throughput, accuracy, and upsell discipline. A voice agent delivers consistent scripts during surges, protects margins with timely cross-sells, and covers late-night shifts without staffing gaps, raising cars/hour, improving order accuracy, and lifting average order value.

    Why Voice AI Interface is Needed

    • Low-latency call-and-response ensures that customers do not wait in silence, allowing the system to respond within a few hundred milliseconds.
    • Barge-in/turn-taking to handle mid-sentence changes and noise at the lane.
    • Tool use (Function Calling) to add/modify items and send to POS reliably.
    • Major QSR pilots have proven industry momentum and lessons learned, indicating clear benefits when handling latency and accuracy well.

    How to Build (3 Steps)

    1. Prompt with menu schema and upsell rules: Initialize the session over WebSocket and send a Settings message that loads a compact menu JSON (SKUs, options, pricing) and simple upsell policy (“If entree=X and no dessert, suggest Y”).
    2. Expose POS tools: Register functions the agent can call during the conversation: addLineItem(sku, qty, options), finalizeOrder(), sendToPOS(orderId). Implement via the FunctionCallRequest/Response flow.
    3. Choose model stack and speaking voice: Use default Deepgram LLM (OpenAI) + Aura-2 or bring your own LLM/TTS by changing agent.think.provider/agent.speak.provider in the Settings message.

    Use Case 3: Appointment Booking & Rescheduling (Healthcare/Services)
    Example Scenario
    A caller says, “I need a dermatology appointment next Thursday after 3 PM.” The agent verifies name/date-of-birth, checks provider availability that matches the constraint, books the first acceptable slot, captures insurance details, and sends an SMS confirmation with date, time, clinic address, and prep notes.

    Why it’s Useful (Business Outcome)
    Booking and rescheduling consume a large share of inbound volume. Automating the happy path reduces abandoned calls, improves booking conversion, and captures structured data correctly the first time.
    Proactive confirmations and reminders reduce no-shows and the manual back-and-forth that inflates average handle time.

    Why Voice AI Interface is Needed

    • Constraint solving in real time: “next Thursday after 3 PM, not Dr. Lee, and telehealth only.”
    • Function calls to calendar/EMR: query availability, enforce scheduling rules, and write back results.
    • Barge-in and turn-taking for clarifications (“Actually make it Friday”) without losing context.
    • Multilingual + accents handling to widen access and reduce errors in names/IDs.

    How to Build (3 Steps)

    1. System prompt with slotting policy: Define business rules in the prompt: required identifiers (full name, DOB), allowed visit types (new/return, in-person/telehealth), slot granularity (e.g., 15 min), and confirmation protocol (read-back + SMS). Include examples for date constraints (“this Friday”, “the 2nd Tuesday in October”).
    2. Tools: checkAvailability, bookSlot, sendSMS: Expose three functions the agent can call: checkAvailability(providerId?, specialty, from, to, constraints) → list of slots; bookSlot(slotId, patientId, visitType, location) → {appointmentId, start, provider}; sendSMS(patientPhone, message) → {ok:true}. Implement via FunctionCall Request/Response.
    3. Multilingual Nova-3 + Aura-2 for confirmations: Use nova-3 for real-time transcripts (with partials and barge-in) and Aura-2 to read back the appointment details clearly. Enable a second language if needed.

    Use Case 4: Interactive Voice Response (IVR) Replacement for SMB
    Example Scenario
    A caller says, “I need last month’s invoice,” or “What are your Saturday hours?” The agent understands the request, answers FAQs directly (hours, address, pricing), or routes to the right queue (Billing vs. Tech Support) with context attached to the transfer.

    Why it’s Useful (Business Outcome)
    Classic DTMF trees, which require users to press numbers for options, often lead to abandoned phone calls and misrouted inquiries.
    A conversational IVR removes keypad friction, cuts time-to-answer, raises containment (self-service success), and reduces abandon rate, especially after hours when staffing is thin.
    When handoff is required, the agent passes a clean summary so humans start at context, not “hello.”

    Why Voice AI Interface is Needed

    • Turn-taking + barge-in for fluid redirection (“Actually, tech support please”), without dead air.
    • Function calls to fetch authoritative answers from your FAQ/CRM, then decide whether to resolve or route.
    • Single, real-time voice-to-voice stack (STT ↔ LLM ↔ TTS) designed for enterprise responsiveness and control.

    How to Build (3 Steps)

    1. Prompt with intents + fallback policy: Define core intents (Billing, Tech Support, Sales, Hours/Location, Pricing, Invoices) and a fallback (“I’ll connect you to a specialist”) after two low-confidence tries and escalate with collected context (name, account/email, summary). Include a short answer style guide: concise, confirm understanding, offer next action.
    2. Tools: getFAQAnswer, routeToQueue: Expose two functions the agent can call at runtime: getFAQAnswer(question|topic) → {answer, source} (pull from CMS/knowledge base/CRM); routeToQueue(queue, context) → {transferId} (SIP/PSTN/CCaaS handoff). Implement with FunctionCallRequest/Response.
    3. Choose default or BYO LLM; use Aura-2 neutral voice: Keep a neutral, friendly TTS voice for brand consistency. If you bring your own LLM, keep strict tool schemas and cap response length.

    Use Case 5: Lead Qualification and Routing (RevOps)
    Example Scenario
    A prospect calls the inbound demo line: “We’re exploring AI voice for our support team.” The agent captures name, company, role, and email; clarifies the use case and timeline; scores the lead (e.g., MEDDIC/BANT); creates the record in a customer relationship management tool; and books an account executive for the earliest qualifying slot.
    The agent then emails a calendar invite and a one-paragraph summary.

    Why it’s Useful (Business Outcome)
    You get a 24/7 SDR that never misses a lead, asks the same crisp discovery questions every time, and pushes complete records into your CRM. This improves qualified rate, shortens speed-to-meeting, and increases pipeline value with cleaner data and fewer handoffs.

    Why Voice AI Interface is Needed

    • Rapid back-and-forth with turn prediction so the dialog feels human.
    • Function calls to write leads, compute a score, and schedule across calendars.
    • Accents/noise robustness and multilingual follow-ups for global inbound.
    • Tone and trust: natural TTS for a professional greeting, concise confirmations, and read-backs that build trust before committing a meeting.

    How to Build (3 Steps)

    1. Prompt with MEDDIC/BANT rules: Seed the agent with the discovery playbook: budget/authority/need/timeline, 3–4 mandatory questions, and a stop rule once qualification is achieved. Add a style guide: keep answers <10s, confirm facts, use an email-summary template, and avoid salesy language.
    2. Tools: createLead, scoreLead, bookMeeting: createLead(name, company, email, phone, useCase) → {leadId}; scoreLead(leadId, meddicJson) → {score, band}; bookMeeting(leadId, slotISO, duration,region, slotPref) → {eventId, start, calendarUrl}. Return concise JSON so the agent can reason and decide next steps.
    3. Model and voice: Use default OpenAI via Deepgram for reasoning or swap in your own LLM; pick Aura-2–Odysseus-en for professional tone. Keep temperature: 0.2 for tight phrasing.

    Conclusion: From Demo to Production (Checklist)
    You’ve got 10 patterns, code templates, and KPI targets. Here’s a pragmatic, staged checklist to take a Voice Agent from “cool demo” to reliable production.

    P0: Must-Haves Before Real Traffic
    Latency SLOs

    • p95: mic → first partial ≤ 250 ms; partial → TTS start ≤ 600 ms; end-to-end round trip ≤ 1.0 s.
    • Alert on breaches; surface per-stage timings in logs.

    Error budgets and retries

    • Define a monthly error budget (e.g., ≤ 0.5% failed interactions).
    • Tool calls: timeouts, exponential backoff, and idempotency keys on create/update.

    Graceful handoff to human

    • One-shot clarify; if confidence stays low or user asks: transfer with summary + transcript + tool context.
    • Track handoff rate and post-handoff resolution.

    Rate limits and protection

    • Apply per-caller and per-IP rate limits; circuit-breaker on upstreams.
    • Backpressure: pause listening or respond with partial acks when tools are slow.

    P1: Make It Observable, Adaptable, and Testable
    Analytics events

    • Emit events for: turn times, barge-ins, tool-call outcomes, handoffs, KPI snapshots.
    • Tie each session to trace IDs for call→tool→TTS correlation.

    Canned fallback prompts

    • Short, brand-safe replies for low-confidence cases.
    • Include “clarify once → handoff” policy.

    A/B Prompt Sets

    • Version prompts (A/B/C) with guardrails; rotate weekly.
    • Track containment, AHT, CSAT deltas per variant.

    Vocabulary and Glossary Updates

    • Maintain a domain glossary (SKUs, acronyms, provider names).
    • Refresh monthly; validate with domain-term accuracy audits.

    P2: Scale and Expand
    Multilingual rollout

    • Detect language → mirror caller; ensure policy translations are approved.
    • Add bilingual read-backs for safety-critical content (addresses, payments).

    Channel expansion

    • PSTN (SIP/CCaaS), WebRTC, mobile SDK.
    • Normalize session analytics across channels; keep turn/latency SLOs identical.

    Operational runbooks

    • Incident playbooks (latency spike, tool outage), on-call rotations, rollback of prompts/functions, and weekly KPI reviews.

    Enterprise or regulated industry?
    Talk to a voice AI expert → for deployment options (network isolation, redaction policies, on-prem/virtual private cloud).

    Deepgram Voice Agent API brings real-time STT + LLM/tooling + TTS into one pipeline so your agents can listen, think, and act —with the speed and control production teams require.

    Original source Report a problem
  • Dec 1, 2025
    • Parsed from source:
      Dec 1, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Deepgram logo

    Deepgram

    Deepgram Brings Best-in-Class Voice AI Models to Amazon Connect & Lex

    Deepgram launches voice AI model integration with Amazon Connect and Lex, bringing ultra-low latency, high-accuracy speech-to-text and TTS into existing workflows. Real-time transcription, quality monitoring, and seamless AWS integration empower enterprise voice agents.

    We’re excited to announce the launch of Deepgram’s voice AI model integration with Amazon Connect and Amazon Lex, bringing Deepgram’s enterprise-grade speech intelligence into the workflows millions of businesses already rely on.

    For years, builders working with Amazon Connect and Amazon Lex have faced a frustrating gap: while these services deliver some of the most powerful customer experience tooling on the market, they’ve lacked access to truly best-in-class, low-latency speech recognition. Today, that changes.

    We’re excited to announce the launch of Deepgram’s voice AI model integration with Amazon Connect and Amazon Lex, bringing Deepgram’s enterprise-grade speech intelligence into the workflows millions of businesses already rely on.

    Where Teams Get Stuck

    Amazon Connect and Lex are industry standards for good reason. But if you wanted truly great speech recognition or natural-sounding TTS, you had two options: settle for what was there, or build something custom yourself. Most teams picked option one and just dealt with it.

    Which meant spending way too much time fighting infrastructure issues instead of actually building good customer experiences.

    What We Built

    Now Deepgram's models—both speech-to-text and text-to-speech—work natively in Lex. You get:

    • Ultra-low latency for natural, real-time conversations
    • State-of-the-art accuracy across noisy environments and diverse accents
    • Scalability to support the largest enterprise deployments
    • Seamless integration with existing Connect and Lex workflows, no hacks, no heavy lifting

    Real-time transcription, quality monitoring, automated workflows, whatever you're building, you can now do it with models that actually perform. And your voice agents will sound human, not robotic.

    Why This Matters

    Your customer service bots can finally respond instantly and accurately. Your compliance team can analyze calls while they're happening. Your infrastructure can grow without constant firefighting.

    But really, it comes down to this: you can finally build the voice experiences you've been trying to build for years. The ones your customers expect. The ones that actually work.

    Early Feedback

    Early adopters in industries from financial services to healthcare are already testing the integration, and the feedback is clear.

    One global contact center lead told us:

    We’ve spent years working around latency and transcription quality issues. Deepgram in Lex is the first time we can actually build the kind of customer interactions we’ve been envisioning.

    What’s Next

    We’ll be at AWS re:Invent December 1-5 in Las Vegas showing this off. Come find us to walkthrough real-world use cases and best practices for getting started.

    And we’re just getting started. Every team using Amazon Connect and Lex should have access to the best voice AI out there, without having to jump through hoops to get it.

    Ready to Get Started?

    If you’re ready for speech recognition and TTS that actually works the way you need it to inside Amazon Connect and Lex, this is it. Whether you’re building intelligent voice agents, transcribing calls for analytics, or powering automated workflows, Deepgram gives you the speed, accuracy, and reliability you need.

    Learn more about our AWS partnership and technical implementation on our AWS partner page.

    Original source Report a problem
  • Dec 1, 2025
    • Parsed from source:
      Dec 1, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Deepgram logo

    Deepgram

    Deepgram Brings Real-Time Speech Intelligence to Amazon SageMaker

    Deepgram unveils native real-time streaming STT, TTS, and Voice Agent API for Amazon SageMaker, delivering sub-second latency and scalable voice AI inside AWS with no custom pipelines. Faster deployments for contact centers, analytics, and compliant workflows.

    The Problem Every Developer Knows

    Ask any engineer who's tried to build real-time speech applications on AWS, and they'll tell you the same story. SageMaker is brilliant for batch ML workloads, but it never supported native streaming for speech. So teams got creative—and by creative, we mean they cobbled together complex architectures with Lambda functions, Kinesis streams, and custom pipelines just to handle audio in real-time.

    The result was predictable: higher latency, operational headaches, and solutions that were brittle at scale. For industries where every millisecond counts—think call centers handling thousands of concurrent conversations, or trading floors where voice commands trigger million-dollar transactions—these workarounds were deal-breakers.

    How We Fixed It

    Our integration does exactly what you'd hope: it makes real-time speech processing work like any other SageMaker endpoint. No custom pipelines. No Lambda gymnastics. Just clean, streaming STT, TTS, and Voice Agent API integrations that scale with your infrastructure and play nicely with your existing ML workflows.

    The real-time integration with Sagemaker gives you everything you'd expect from enterprise-grade speech AI: support for HTTP/2 or WebSockets, sub-second latency, automatic scaling, and the security and compliance benefits that come with staying inside the AWS ecosystem. More importantly, it means your teams can focus on building great voice experiences instead of wrestling with infrastructure.

    Watch the Workflow in Practice

    To show what this unlocks in a real workflow, here is a short demo of a pharmacy voice agent built on Deepgram and running on SageMaker. In the video, the agent handles an end-to-end customer inquiry: authenticating a caller with a Member ID, pulling the correct order, identifying the medication, checking refill availability, and giving a precise pickup time. Each step is powered by real-time streaming STT, TTS, and agent logic running natively on SageMaker, so the interaction feels natural and responsive while retrieving accurate, structured data from backend systems.

    How the Demo Works

    The diagram illustrates the workflow behind the pharmacy demo. Audio from the user is streamed into Deepgram STT through the new SageMaker BiDirectional Streaming API for transcription, which is then passed to an LLM hosted on Amazon Bedrock along with the relevant pharmacy data. The model generates a structured text response, which is returned to Deepgram TTS through the new SageMaker BiDirectional Streaming API to synthesize natural-sounding speech. Pipecat provides the orchestration layer that manages each step of the pipeline, making it easy to coordinate audio streaming, model calls, and database lookups inside an AWS VPC. The result is a fully synchronous, low-latency voice interaction that feels like speaking with a real assistant while keeping every component inside your AWS environment.

    What This Means for You

    The applications are pretty much everywhere you'd expect:

    Contact centers can finally do real-time sentiment analysis and live agent coaching without the infrastructure complexity. Conversational AI applications get more responsive and can handle the kind of natural, flowing conversations that users actually want to have. Analytics teams can process voice data as it comes in rather than waiting for batch jobs to complete.

    And compliance teams, who often have the most stringent requirements around data handling, get all the benefits of AWS's security model without having to worry about data leaving their VPC for external speech processing.

    Go Deeper: AWS Engineering Walkthrough

    If you want to explore how SageMaker’s new bidirectional streaming works under the hood, the AWS team has published an in-depth engineering walkthrough. It covers the runtime architecture, WebSocket flow, container requirements, and how to deploy Deepgram models using the new streaming APIs. Read the full guide on the AWS Machine Learning Blog.

    What Comes Next

    This launch represents months of joint engineering work with AWS, and we're not slowing down. We'll be at re:Invent demonstrating the real-time implementation of Deepgram on Sagemaker. We're also planning a series of technical deep-dives for teams who want to understand exactly how to architect these solutions.

    The broader goal here isn't just solving a technical problem—though we've definitely done that. It's about removing the barriers that have kept voice AI on the sidelines for too many enterprise applications. When building real-time speech processing is as straightforward as deploying any other ML model, we think you'll be surprised by what teams start building.

    Ready to Get Started?

    If you've been waiting for native streaming Voice AI in SageMaker, your wait is over. The integration is live, and our team is ready to help you implement it. Whether you're processing customer calls, building voice assistants, or analyzing meeting recordings in real-time, we've built this to handle your use case.

    For developers who want early access to our SageMaker SDK for Python and JavaScript, you can request entry to the beta program here.

    Ready to transform your voice AI capabilities? Connect with our team to explore how Deepgram's SageMaker integration can accelerate your journey from speech data to actionable insights—instantly.

    Learn more about our AWS partnership and technical implementation on our AWS partner page.

    Original source Report a problem

Related vendors