Deepgram Release Notes

Last updated: Feb 15, 2026

  • Feb 13, 2026
    • Date parsed from source:
      Feb 13, 2026
    • First seen by Releasebot:
      Feb 15, 2026
    Deepgram logo

    Deepgram

    Voice Is Now a First-Class Citizen in OpenClaw

    DeepClaw Hosted introduces a cloud hosted OpenClaw you can talk to on a real phone, with its own number, memory, and tools. Call or text anytime and your agent runs in the cloud, accessible from your pocket. An experimental launch from Deepgram Labs.

    What is this?

    DeepClaw bridges Deepgram's Voice Agent API with OpenClaw. It turns OpenClaw into something you can talk to — over a real phone call, on any phone, anywhere.
    When we set up DeepClaw last week as a skill for adding Twilio/Telnyx, it still required manual configuration and api keys from you.
    DeepClaw Hosted takes that further. We run the infrastructure, we provide the keys. You call a number. We spin up an isolated OpenClaw instance just for you, assign you a personal phone number, and connect it all together. Your agent, your memory, your tools — running in the cloud, reachable from your pocket.

    What you get

    • A personal phone number — call or text your OpenClaw anytime
    • Full agent capabilities — tools, memory, web search, code execution
    • Cross-channel memory — your call history and texts live in the same instance
    • Proactive callbacks — set a reminder and your agent calls you back

    What it sounds like

    Deepgram Flux handles speech-to-text with semantic turn detection — it knows when you're actually done talking, not just when you pause. Aura-2 handles text-to-speech at 90ms TTFB. The result is a conversation that feels like a conversation.

    The fine print

    This is an experiment from Deepgram Labs. It's free. It's also provided as-is — no SLA, no guarantees, no warranty. Use it, break it, tell us what you think and how it could be better. We're building in public and we want to see what happens when OpenClaw has a dial tone.

    Try it

    Call 910-MOLTBOT (910-665-8268).
    That's the whole onboarding flow.

    Original source Report a problem
  • Feb 13, 2026
    • Date parsed from source:
      Feb 13, 2026
    • First seen by Releasebot:
      Feb 15, 2026
    Deepgram logo

    Deepgram

    Nova-3 Multilingual Speech-To-Text: Improving Multilingual Accuracy at Production Scale

    Nova-3 Multilingual delivers a retrained model with lower WER in both batch and streaming, plus major gains in code-switching. No API changes required as the updated production model goes live across English and other languages. This upgrade boosts real-world multilingual transcription accuracy.

    Nova-3 Multilingual Update

    Nova-3 Multilingual now delivers lower WER, including a ~34% relative reduction in batch mean WER and a ~21% relative reduction in streaming mean WER, with strong gains in code-switching. No API changes required.

    We’ve released an updated Nova-3 Multilingual speech-to-text model, delivering accuracy improvements across all supported languages, including a ~34% relative reduction in batch mean WER and a ~21% relative reduction in streaming mean WER, with the largest gains in code-switching scenarios.

    This update focuses on improving real-world multilingual speech recognition, especially for inputs that mix languages within a single utterance or conversation.

    Key improvements include:

    • Lower Word Error Rate (WER) across both batch and streaming inference
    • Significantly improved handling of code-switching, reducing word drops when languages are mixed
    • No API or configuration changes required - the updated model is live now

    Why This Update Matters

    Speech recognition in the real world is messy. People switch languages mid-sentence, mix vocabulary, speak with varied accents, and move fluidly between contexts. Solving for this complexity is one of the core challenges in multilingual speech-to-text and automatic speech recognition (ASR).

    Multilingual speech recognition becomes significantly more complex when languages are mixed within the same conversation, or even the same sentence. Consider a bilingual English/Spanish speaker saying:

    “I was charged twice, pero solo hice una compra.”
    (​​Translation: I was charged twice, but I only made one purchase.)

    In situations like this, models must correctly recognize words as speakers switch languages mid-sentence. Historically, these transitions have been challenging for multilingual systems.

    Improving performance in these scenarios requires retraining and evaluation across datasets that include both monolingual and mixed-language audio.

    Supported Languages

    Nova-3 Multilingual supports the following languages: English, Spanish, French, German, Hindi, Italian, Japanese, Dutch, Russian, and Portuguese. For more information about supported languages and model capabilities, visit our Models & Languages documentation page.

    What Went Into This Update

    This release reflects a retrained Nova-3 Multilingual model evaluated across a diverse set of multilingual benchmarks.

    We made advances in:

    • Curriculum: the order and types of data we show the model, so that the model gets the appropriate exposure to multilingual and code-switching data during training.

    • Data Curation: how we filter data and select data, so that the model trains on accurately labeled data, especially with respect to code-switching.

    • Overall Word Error Rate (WER) Improvement
      Note: Mean WER is an unweighted average across datasets. Aggregate WER is weighted by total word count across datasets.

    • Code-Switching Word Error Rate (WER) by Language
      The charts below show Word Error Rate (WER) by language on code-switching datasets, comparing the previous and current Nova-3 Multilingual models across both batch and streaming modes.

    • Multilingual Keyterm Prompting
      Nova-3 Multilingual supports Keyterm Prompting. This allows developers to guide transcription toward domain-specific terminology, brand names, product names, and keywords, without retraining models or managing custom vocabularies.

    Keyterm Prompting is applied dynamically at inference time, making customization fast and flexible across languages.

    This capability is especially valuable for:

    • Call centers and customer support systems
    • Voice agents and IVR applications
    • Industry-specific analytics and transcription workflows

    What This Means for Builders

    For developers and enterprises building multilingual speech-to-text voice experiences these improvements translate into:

    • Fewer transcription errors
    • Reduced manual correction
    • More reliable downstream analytics
    • Stronger performance in mixed-language real-world audio

    Build Globally with Deepgram and Unlock Enterprise-Grade Voice AI Today

    The updated model is live now and serves as the default Nova-3 Multilingual production model — no API or configuration changes required. Sign up free and unlock $200 in credits, enough to power over 750 hours of transcription or 200 hours of speech-to-text across Nova-3’s growing language suite. Explore details on our Models & Languages Overview page and experience Nova-3’s world-class adaptability for yourself.

    Original source Report a problem
  • All of your release notes in one feed

    Join Releasebot and get updates from Deepgram and hundreds of other software products.

  • Feb 12, 2026
    • Date parsed from source:
      Feb 12, 2026
    • First seen by Releasebot:
      Feb 15, 2026
    Deepgram logo

    Deepgram

    We’re Tripling Default Concurrency to Power the Voice AI Economy

    Deepgram raises production-grade concurrency with new default limits across Voice Agent API, Streaming STT, and TTS. Growth Plan gains up to 4.5x, guaranteed floors from day one, reducing 429 errors as you scale for enterprise voice AI in production.

    Rate limits often stall demos and early scale, creating artificial bottlenecks unrelated to product quality. Deepgram is minimizing those limits—investing in infrastructure to remove scaling ceilings and support the Voice AI ecosystem trusted by 1,300+ organizations.

    Rate limit errors have a way of appearing at the worst possible moment: during a demo, right as your customer starts ramping up, or when your agent traffic finally starts scaling.

    Nothing kills momentum like a 429 error when your voice agent should be handling 20 concurrent calls but your infrastructure is capped at 15. You’ve built something that works, your users want it, and then you hit a ceiling that has nothing to do with your code.

    Today we’re raising that ceiling. With 1,300+ organizations powered by Deepgram, this infrastructure investment is part of our commitment to scaling the platform for the Voice AI economy.

    What’s New: Production-Grade Concurrency from Day One

    We're tripling default concurrency limits across Voice Agent API, Streaming STT, and TTS. Growth Plan customers get up to 4.5x.

    New default concurrency limits:

    API Product | Pay as You Go | Growth Plan
    Voice Agent API (connections) | 45 (was 15) | 60 (was 15)
    Streaming STT (streams) | 150 (was 50) | 225 (was 50)
    WSS TTS (streams) | 45 (was 15) | 60 (was 15)

    These changes apply automatically today — no action needed on your end.

    Why This Matters for Voice AI Teams

    Here’s what the increase means for teams building on Deepgram. As Voice AI moves from pilot to production across enterprise teams, the infrastructure underneath has to keep up. These new defaults are part of Deepgram’s broader platform investments, the same foundation powering teams from startups to household name enterprises using AI at scale.

    Built for Teams Serving Multiple Customers

    If you’re building a conversational AI platform serving thousands of customers, or a meeting intelligence product processing high volumes for enterprise clients, the math just got 3x better.

    Built to scale: With 45 WSS streams, 10 clients can now burst to 4–5 streams each, providing headroom for multi-turn conversations without a single customer’s traffic spike impacting other tenants. That’s the difference between one customer's spike taking everyone else down and a reliable production system.

    Voice Agent stack: If you’re running STT, TTS, and Voice Agent API together, the new limits give all the room you need to scale. You can now run 45+ concurrent agents with headroom for traffic spikes.

    What this means in practice:

    • Fewer HTTP 429 errors during integration and production scaling
    • More reliable user experience across your customer base and regions, no service failures during heavy growth or demand spikes
    • Scale without filing support tickets

    Use cases that scale faster:

    • Conversational AI platforms scaling agents across multiple customers and regions
    • Meeting intelligence products processing high volumes for enterprise clients and multiple languages
    • Contact center analytics teams serving hundreds of locations or multiple regions
    • Healthcare, legal, and financial teams running high-throughput, multi-tenant workloads

    Transparent Upgrade Paths

    We publish our concurrency defaults by payment plan. You know exactly what you get, with no surprises and no support tickets to figure out your limits. As you consume more, concurrency automatically scales with your growth as you move to higher plans, with Enterprise offering the highest concurrency support. Additional capacity beyond your plan is available if you need it. Reach out to us for details.

    Guaranteed Capacity from Day One

    Some vendors market “unlimited” concurrency but implement dynamic scaling: 10% ramp-up periods every 60 seconds when you exceed 70% utilization. During a traffic spike, that’s a 25-minute wait to scale from 100 to 1,000 streams, assuming perfect ramp-up conditions. Your application waits while their infrastructure catches up, which essentially means your customers pay the price when you're trying to deliver sub-second response times.

    Other vendors start concurrency so low that builders are forced into spend-based tier advancement and manual approvals just to reach production-grade limits.

    Deepgram starts with high, guaranteed floors, all available immediately. The infrastructure is pre-provisioned for scale, so you can move from prototype to production without waiting for permission.

    Your critical voice infrastructure keeps up with your growth.

    A Few Things to Know

    These are guaranteed floors, with transparent limits. You know exactly what capacity you have from day one.

    This is a permanent platform enhancement, built into your plan going forward.

    Some deployments may take longer. Regional (EU) or self-hosted deployments may not reflect these changes immediately. Questions about your deployment? Contact your account team for specifics.

    On a contract? Your account team will walk you through how these changes apply to your plan.

    Getting Started

    If You’re Already Building

    • Review updated API Rate Limits documentation
    • Test scaling scenarios with your new limits

    If You’re Evaluating for Your Team

    • Review Pricing page for updated plan-specific defaults
    • Reach out to us if you need concurrency above new defaults
    • On Pay as You Go? Growth Plan gets you up to 4.5x concurrency. Upgrade here.

    Try It Now

    Get your API key: Sign up for a Deepgram account and get $200 in free credits.

    Your infrastructure scales with your success. We’ve increased default concurrency so you can focus on building voice AI that works, without waiting for permission to grow. That’s what building for the Voice AI economy looks like in practice.

    Original source Report a problem
  • Feb 12, 2026
    • Date parsed from source:
      Feb 12, 2026
    • First seen by Releasebot:
      Feb 15, 2026
    Deepgram logo

    Deepgram

    Nova-3 Speech-to-Text Expands Global Language Coverage with Hebrew, Persian, and Urdu

    Nova-3 adds production-grade Hebrew, Persian, and Urdu speech-to-text, expanding RTL language coverage for global call centers, media transcription, and analytics. Keyterm Prompting steers domain terms without retraining while keeping API consistency. Test it in Deepgram Playground and via the Nova-3 API.

    Deepgram Nova-3 adds Hebrew, Persian, and Urdu

    Deepgram now supports Hebrew, Persian, and Urdu speech-to-text on Nova-3. Production-grade monolingual models with streaming, batch, and Keyterm Prompting.

    Deepgram continues to expand Nova-3 speech-to-text globally with the addition of three monolingual right-to-left (RTL) languages – Hebrew, Persian, and Urdu – now available in production. Designed for call centers, voice agents, media transcription, and analytics, this release delivers Nova-3 level accuracy, Keyterm Prompting, and production-grade performance for developers building voice applications that support RTL languages across the Middle East and South Asia

    Built for Global Voice Applications
    This launch reinforces Nova-3 as an enterprise-grade speech-to-text platform built for global scale, supporting a wider range of scripts and writing systems used in real-world production environments. With Arabic already available, the addition of Hebrew, Persian, and Urdu extends Nova-3’s support for right-to-left languages while preserving consistent performance and behavior across the platform.

    As voice products scale globally, teams need to add new languages without adding integration complexity. The addition of Hebrew, Persian, and Urdu to Nova-3 eliminates the need to manage multiple vendors or stitch together separate models—making it easier to expand into new markets and serve customers across the Middle East and South Asia.Together, these capabilities continue to establish Nova-3 as the foundation for global voice applications operating across multiple languages, regions, and deployment requirements.

    Three New Languages Live on Nova-3

    Hebrew Speech-to-Text (he)

    Hebrew is spoken by more than 10 million people worldwide and serves as the primary language of business, media, and digital communication in Israel. Modern Hebrew blends ancient and biblical linguistic roots with contemporary vocabulary and fast-paced conversational speech patterns. Hebrew powers call centers, voice agents, and enterprise systems.

    Nova-3 Hebrew brings monolingual speech-to-text support for real-world conversational Hebrew, enabling accurate transcription across production voice environments.

    Persian Speech-to-Text (fa)

    Persian (Farsi) is spoken by an estimated 130 million people worldwide, extending from Iran to Afghanistan, Tajikistan, and Persian-speaking communities in the diaspora. Written in a modified Arabic script, Persian is known for rich poetic and contextual expression. Persian is used across media, enterprise systems, and digital platforms throughout the region.

    Nova-3 Persian delivers monolingual speech-to-text for modern conversational Persian across voice-driven applications.

    Urdu Speech-to-Text (ur)

    Urdu is spoken by more than 230 million people worldwide and is one of South Asia’s most widely used languages across commerce, media, and customer service environments. Written in the Nastaliq script and heavily influenced by Persian and Arabic vocabulary, Urdu reflects a unique blend of linguistic traditions used across regional and diaspora markets.

    Nova-3 Urdu provides monolingual speech-to-text support for conversational Urdu at production scale.

    The chart below highlights initial word error rates (WER) for Hebrew, Persian, and Urdu across both streaming and batch transcription workflows on Nova-3:

    Keyterm Prompting Across Right-to-Left Languages

    Hebrew, Persian, and Urdu on Nova-3 all support Keyterm Prompting, allowing developers to guide transcription toward domain-specific terminology, brand names, product names, and keywords, without retraining models or managing custom vocabularies.

    Keyterm Prompting is applied dynamically at inference time, making customization fast and flexible across languages.

    This capability is especially valuable for:

    • Call centers and customer support systems
    • Voice agents and IVR applications
    • Industry-specific analytics and transcription workflows

    Built for Developers and Enterprises

    Hebrew, Persian, and Urdu are available through the same Nova-3 API developers already use today, with support for both real-time streaming and batch transcription. All three languages can be tested directly in the Deepgram Playground before deploying to production.

    Switching languages is simple, just specify the appropriate language code in your request:

    curl --request POST \
      --header "Authorization: Token YOUR_DEEPGRAM_API_KEY" \
      --header "Content-Type: audio/wav" \
      --data-binary @youraudio.wav \
      "https://api.deepgram.com/v1/listen?model=nova-3&language=he"
    

    Supported language codes: he, fa, ur

    Build Globally with Deepgram and Unlock Enterprise-Grade Voice AI Today

    Sign up free and unlock $200 in credits, enough to power over 750 hours of transcription or 200 hours of speech-to-text across Nova-3’s growing language suite. Explore details on our Models & Languages Overview page and experience Nova-3’s world-class adaptability for yourself.

    Original source Report a problem
  • Feb 12, 2026
    • Date parsed from source:
      Feb 12, 2026
    • First seen by Releasebot:
      Feb 13, 2026
    Deepgram logo

    Deepgram

    February 12, 2026

    Deepgram Self-Hosted February 2026 Release bundles updated container images for API, engine, license proxy and billing with release-260212 and compatible tags. Includes minimum NVIDIA driver 570.172.08 and general improvements for up-to-date software.

    Container Images Release

    Deepgram Self-Hosted February 2026 Release (260212)

    Container Images (release 260212)

    • quay.io/deepgram/self-hosted-api:release-260212
      • Equivalent image to: quay.io/deepgram/self-hosted-api:1.177.3
    • quay.io/deepgram/self-hosted-engine:release-260212
      • Equivalent image to: quay.io/deepgram/self-hosted-engine:3.107.0-1
    • Minimum required NVIDIA driver version:
      • =570.172.08

    • quay.io/deepgram/self-hosted-license-proxy:release-260212
      • Equivalent image to: quay.io/deepgram/self-hosted-license-proxy:1.9.2
    • quay.io/deepgram/self-hosted-billing:release-260212
      • Equivalent image to: quay.io/deepgram/self-hosted-billing:1.12.1

    This Release Contains The Following Changes

    General Improvements — Keeps our software up-to-date.

    Original source Report a problem
  • Feb 11, 2026
    • Date parsed from source:
      Feb 11, 2026
    • First seen by Releasebot:
      Feb 11, 2026
    Deepgram logo

    Deepgram

    February 11, 2026

    New Default Concurrency Limits

    We’re increasing default concurrency limits by up to 3X for Streaming Speech to Text, Text to Speech, and Voice Agent for Pay as you Go, Growth, and Enterprise plans.

    For full details on the rate limits for your plan, see the API Rate Limits documentation.

    Original source Report a problem
  • Feb 5, 2026
    • Date parsed from source:
      Feb 5, 2026
    • First seen by Releasebot:
      Feb 15, 2026
    Deepgram logo

    Deepgram

    OpenClaw/MoltBot/ClawdBot: Rebrand All You Want—We’ll Still Let You Call It with the Deepgram Voice Agent API

    OpenClaw gets a voice boost with deepclaw, a zero code, open source integration that lets your AI assistant talk over the phone using Deepgram. It ships a guided setup, a tiny Python server, and a GitHub repo to drop in a single skill file and go loud and clear.

    Your clawdbot needs to talk to you—on the phone. This mind-bender of an integration combines Deepgram + OpenClaw to give your bot a voice. Today we’re launching deepclaw, with a GitHub repo and a step-by-step guide below.

    Today we're releasing deepclaw, an open-source integration that lets you call your OpenClaw AI assistant over the phone using Deepgram's Voice Agent API.

    Why we built this

    As OpenClaw took the world by storm, we knew it could benefit from the most natural interface: voice. Our Voice Agent API combines Flux speech-to-text, Aura-2 text-to-speech, and intelligent turn-taking into a single, streamlined solution.

    What makes Deepgram different

    The key differentiator is Flux's native turn detection. Instead of waiting for silence (VAD), Flux understands when you're done talking. This means fewer awkward interruptions and faster responses. Ramble to your OpenClaw, it won’t interrupt!

    Dead simple setup

    We built deepclaw so your OpenClaw can set it up for you. Drop in one skill file, tell your OpenClaw "I want to call you on the phone," and it walks you through everything—Deepgram account, Twilio number, configuration, all of it.

    No code to write. No complex integration. Just conversation.

    Open source

    deepclaw is fully open source. The entire voice agent server is ~400 lines of Python. Fork it, modify it, self-host it.

    Here's the GitHub repo!

    What’s Next

    We’re exploring ways to make the setup even faster and easier, so even non-technical users can get the most out of it. Also, we noticed higher than desired latency from OpenClaw itself, so we’re going to find a way to get latency down as much as possible, to make it feel like a natural conversation.

    Get started

    • Copy the skill to ~/.openclaw/skills/deepclaw-voice/
    • Tell your OpenClaw: "I want to call you on the phone"
    • Follow the prompts
    • Call your new number

    Voice AI should be fast, affordable, and natural. That's what we're building.

    Original source Report a problem
  • Feb 5, 2026
    • Date parsed from source:
      Feb 5, 2026
    • First seen by Releasebot:
      Feb 5, 2026
    Deepgram logo

    Deepgram

    February 5, 2026

    Nova-3 Multilingual Update boosts accuracy across languages with strong gains in code switching, delivering lower WER in both batch and streaming. This release enhances real world multilingual speech recognition while keeping APIs and configs unchanged.

    Nova-3 Multilingual Model Update

    🌍 Nova-3 Multilingual Improvements

    We’ve released an updated Nova-3 multilingual model, delivering accuracy improvements across supported languages, with the largest gains in code-switching scenarios.

    This update focuses on improving real-world multilingual speech recognition, especially for inputs that mix languages within a single utterance or conversation.

    Key improvements include:

    • Lower Word Error Rate (WER) across both batch and streaming inference for all languages supported by the multilingual model
    • Significantly improved code-switching handling, reducing word drops when languages are mixed

    These improvements help developers build more reliable, natural multilingual voice experiences without changing APIs or configuration.

    Learn more about Nova-3 Multilingual on the Models and Language Overview page.

    Original source Report a problem
  • Feb 3, 2026
    • Date parsed from source:
      Feb 3, 2026
    • First seen by Releasebot:
      Feb 4, 2026
    Deepgram logo

    Deepgram

    February 3, 2026

    Nova-3 now adds Hebrew, Farsi, and Urdu support, delivering enhanced speech-to-text for these languages. The new monolingual models expand language coverage and enable inclusive voice experiences for Hebrew he, Farsi fa, and Urdu ur.

    Nova-3 Model Update

    🌐 Nova-3 Adds Support for Hebrew, Farsi, and Urdu

    We’re excited to announce the release of new Nova-3 monolingual models for Hebrew, Farsi, and Urdu! These additions bring industry-leading speech-to-text capabilities for users of these languages.

    Nova-3 now supports the following new languages and language codes:

    • Hebrew (he)
    • Persian (Farsi) (fa)
    • Urdu (ur)

    This release empowers developers and businesses to build more inclusive voice experiences for communities speaking Hebrew, Farsi, and Urdu across the globe.

    Learn more about Nova-3 on the Models and Language Overview page.

    Original source Report a problem
  • Jan 29, 2026
    • Date parsed from source:
      Jan 29, 2026
    • First seen by Releasebot:
      Jan 31, 2026
    Deepgram logo

    Deepgram

    Deepgram Expands Nova-3 with Arabic Speech-to-Text

    Deepgram unveils Nova-3 Arabic speech-to-text with production-grade accuracy across 17 dialects for real spoken Arabic. Available in cloud or self-hosted modes with Keyterm Prompting and RTL support, delivering best-in-class WER across Gulf, MSA, Egyptian and Levantine.

    Deepgram introduces Arabic speech-to-text on Nova-3, built for real spoken Arabic. Production-grade accuracy across 17 regional variants, with cloud or self-hosted deployment.

    Deepgram has introduced a state-of-the-art monolingual Arabic speech-to-text model on Nova-3, built for how Arabic is actually spoken and used in production. The model supports broad Arabic dialect coverage across Arabic-speaking regions, including the Middle East, the Gulf, and North Africa. In benchmarking on conversational Arabic, Nova-3 Arabic delivers best-in-class accuracy, achieving up to ~40% lower word error rates compared to competing speech-to-text systems.

    This release further establishes Nova-3 as an enterprise-grade speech-to-text model, delivering high accuracy, Keyterm Prompting, and production-grade performance for developers building Arabic-language voice applications globally. Arabic is also the first right-to-left language supported on Nova-3, extending support to additional scripts and writing systems used in real-world production environments.

    Designed for Arabic Speech in Real-World Applications

    Nova-3 Arabic is purpose-built for production use cases where Arabic speech-to-text must perform reliably at scale, including:

    • Call centers
    • Customer support
    • Voice agents and IVR systems
    • Conversational and speech analytics

    The model is optimized for spoken Arabic as it appears in real production systems, across regions, dialects, and deployment environments, with support for:

    • 0Natural spoken Arabic, as used in everyday customer interactions
    • Dialectal recognition across Arabic-speaking regions
    • Practical readability over textbook-perfect spelling
    • High-volume real-time and batch workflows

    While Nova-3 supports Modern Standard Arabic, it is optimized for spoken Arabic used in conversation, returning clean and readable transcripts without diacritics (short vowel and grammatical markers) that are immediately usable in production systems.

    Arabic Speech-to-Text Coverage Across 17 Regional Variants

    Spoken Arabic varies significantly by region, with meaningful differences in pronunciation, vocabulary, and speech patterns across countries. In real production environments, audio is often dialect-heavy. Nova-3 Arabic supports 17 Arabic language variants across major regional dialect groups:

    Pan-Arab / Modern Standard Arabic (MSA)

    • Generic Arabic (ar)

    Gulf Arabic ( 27 44 2e 44 4a 2c)

    • United Arab Emirates (ar-AE)
    • Saudi Arabia (ar-SA)
    • Qatar (ar-QA)
    • Kuwait (ar-KW)

    Levantine Arabic ( 628 44 27 2f 27 44 34 27 45)

    • Syria (ar-SY)
    • Lebanon (ar-LB)
    • Palestine (ar-PS)
    • Jordan (ar-JO)

    Egyptian / Nile Arabic ( 648 27 2f 4a 27 44 46 4a 44)

    • Egypt (ar-EG)
    • Sudan (ar-SD)

    Maghrebi Arabic ( 627 44 45 3a 31 28 27 44 39 31 28 4a)

    • Morocco (ar-MA)
    • Algeria (ar-DZ)
    • Tunisia (ar-TN)

    Mesopotamian Arabic ( 627 44 39 31 27 42)

    • Iraq (ar-IQ)

    Peripheral Arabic Dialects

    • Chad (ar-TD)
    • Iran (ar-IR)

    Keyterm Prompting Across Dialects

    Nova-3 Arabic benefits from Keyterm Prompting across all the various dialects, allowing developers to guide transcription toward domain-specific terminology, jargon, brand names, and keywords. This improves recognition without retraining models or managing custom vocabularies. Key terms are applied dynamically at inference time, making customization fast and flexible.

    This capability is especially valuable for:

    • Call centers and customer support systems
    • Voice agents and IVR applications
    • Industry-specific analytics and transcription workflows

    Nova-3 Arabic Speech-to-Text Outperforms Competitors

    To evaluate real-world performance, Nova-3 Arabic was benchmarked against other leading speech-to-text systems on conversational Arabic speech across multiple regional dialects.

    Nova-3 Arabic achieves the lowest word error rates (WER) across Gulf, MSA, Egyptian, and Levantine dialects, outperforming all major competitors.

    Key takeaways:

    • Nova-3 provides best-in-class accuracy on dialect-heavy conversational Arabic.
    • Nova-3 Arabic maintains consistently low WER across regions, including Gulf, Egyptian, Levantine, and North African Arabic to outperforming competitors most clearly in these dialect-heavy regions.
    • Nova-3 Arabic delivers best-in-class accuracy, achieving up to ~40% lower word error rates compared to competing speech-to-text systems.

    *Arabic has multiple valid written forms for the same spoken words. To ensure a fair comparison, we normalized common spelling variants (such as diacritics, alef forms, ta marbuta, and number formats) so models are compared on what they heard, not on stylistic writing differences.

    Deployment Modes: Cloud API or Self-Hosted

    Nova-3 Arabic is available in two deployment modes for increased flexibility based on use case:

    Cloud API (Deepgram-Hosted)

    • Fastest way to get started
    • Same Nova-3 speech to text cloud API customers already use
    • Ideal for most production workloads

    Self-Hosted (Customer-Operated)

    • Run Nova-3 Arabic STT in your own environment
    • Audio never leaves your infrastructure
    • Designed for strict data residency, privacy, security, or latency requirements

    Built for Developers and Enterprises

    All supported Arabic variants are available through the same API developers already use today. You can test all supported languages directly in the Deepgram Playground before deploying to production.

    Switching to any of the newly supported languages is simple. Update your API request with the appropriate language code:

    curl --request POST
    --header "Authorization: Token YOUR_DEEPGRAM_API_KEY"
    --header "Content-Type: audio/wav"
    --data-binary @youraudio.wav
    "https://api.deepgram.com/v1/listen?model=nova-3&language=ar"

    Supported language codes:
    ar, ar-AE, ar-SA, ar-QA, ar-KW, ar-SY, ar-LB, ar-PS, ar-JO, ar-EG, ar-SD, ar-MA, ar-DZ, ar-TN, ar-IQ, ar-TD, ar-IR

    Build Arabic Voice Applications with Nova-3

    Arabic is a critical language for global voice applications, spanning diverse regions, dialects, and deployment requirements. With the addition of Arabic speech-to-text, Nova-3 continues to serve as a production-grade foundation for global voice AI, built to support how people actually speak.

    Sign up free and unlock $200 in credits, enough to power over 750 hours of transcription or 200 hours of speech-to-text across Nova-3's growing language suite. Explore details on our Models & Languages Overview page and experience Nova-3's world-class adaptability for yourself.

    Original source Report a problem

Related vendors