AssemblyAI Release Notes
45 release notes curated from 1 source by the Releasebot Team. Last updated: May 1, 2026
- Apr 29, 2026
- Date parsed from source:Apr 29, 2026
- First seen by Releasebot:May 1, 2026
Introducing the Voice Agent API
AssemblyAI launches the Voice Agent API, a complete voice agent pipeline over a single WebSocket with built-in speech understanding, LLM reasoning, and voice generation. It adds server-side turn detection, live configuration, tool calling, session resumption, and simple all-in pricing.
The Voice Agent API is now available — a complete voice agent pipeline built on AssemblyAI's own models, delivered through a single WebSocket. Stream audio in, get audio back, and pay one all-in rate of $4.50/hr that covers speech understanding, LLM reasoning, and voice generation.
The API runs on Universal-3 Pro Streaming, the same speech model that already powers production voice stacks — accurate on names, account numbers, domain terminology, and accented speech across six languages. Turn detection runs server-side with configurable thresholds, so the agent knows the difference between a thinking pause and an end-of-turn, and interruptions stop the agent immediately. Listening that actually works is the foundation; everything downstream gets better when the transcription and turn-taking are right.
The developer experience is designed to get out of the way. No SDK to install, no framework to learn — the entire API surface is JSON over WebSocket and most teams ship a working agent the same afternoon they start. Live configuration lets you update system prompts, tools, or turn detection mid-conversation with no reconnect. Tool calling with JSON Schema lets the agent take real actions through your custom functions. Session resumption restores full context if a WebSocket drops within 30 seconds.
How to use it
- Open a WebSocket connection to the Voice Agent API endpoint and stream audio in; receive audio and event messages back as JSON
- Configure agent behavior at session start or mid-conversation — system prompt, tools, turn detection thresholds — via standard JSON message types
- Register custom functions with JSON Schema for tool calling; reconnect within 30 seconds with session resumption to preserve context on dropped connections
- Single billing line at $4.50/hr covering STT, LLM, and TTS — measured in audio hours, no separate metering for each pipeline stage
- Available now to all customers; works end-to-end with Claude Code for scaffolding integrations directly from your terminal when using our
AssemblyAI Docs MCP
The Voice Agent API is invisible infrastructure for production voice products — accurate listening, natural turn-taking, and a developer surface small enough to read in 10 minutes. Your customers should feel like you built it for them, not like they're using a platform.
Original source - Apr 25, 2026
- Date parsed from source:Apr 25, 2026
- First seen by Releasebot:Apr 27, 2026
PII Redaction: Return Unredacted Transcripts in the Same Request
AssemblyAI adds a new PII Redaction option that returns both redacted and unredacted transcript data in one request. The opt-in update includes original text, words, and utterances alongside the redacted output, reducing extra API calls for compliance workflows.
You can now retrieve both the redacted and unredacted versions of a transcript in a single PII Redaction request. Set the new
redact_pii_return_unredactedflag to true in yourPOST /v2/transcriptbody, and the response will include the original text, words, and utterances alongside the redacted output — no second API call required.The new fields are purely additive.
text,words, andutterancesstay fully redacted as before, and three new top-level fields —unredacted_text,unredacted_words, andunredacted_utterances— are returned alongside them with the original PII intact. The unredacted word and utterance arrays mirror the exact shape of their redacted counterparts (text,start,end,confidence,speaker,channel).This is an opt-in convenience for workflows that need both versions in the same place — for example, a UI that toggles between redacted-first and unredacted views, or a dual-pipeline that stores compliance-grade redacted output for sharing while preserving the original in a trusted environment. It removes the need for previously brittle workarounds like sending two API requests, doing client-side redaction via Entity Detection, or post-hoc LLM-based redaction.
How to use it
Add
redact_pii_return_unredacted: truealongside the existing PII parameters in your transcription request:{ "audio_url": "YOUR_AUDIO_URL", "redact_pii": true, "redact_pii_return_unredacted": true, "redact_pii_policies": ["person_name", "phone_number", "email_address"], "redact_pii_sub": "entity_name" }- Requires
redact_pii: true— sendingredact_pii_return_unredacted: trueon its own returns HTTP 400 - Defaults to false; when off or absent, responses are unchanged and the three
unredacted_*fields are not returned - Works with all existing PII params, including
redact_pii_policies,redact_pii_sub, andredact_pii_audio - Available now on Pre-recorded transcription, with SDK support live across Python and JavaScript
AssemblyAI's PII Redaction automatically detects and removes sensitive information from both transcripts and audio — giving you compliant, production-ready output without extra processing steps.
Original source All of your release notes in one feed
Join Releasebot and get updates from AssemblyAI and hundreds of other software products.
- Apr 19, 2026
- Date parsed from source:Apr 19, 2026
- First seen by Releasebot:Apr 20, 2026
Claude Opus 4.7 Now Available on LLM Gateway
AssemblyAI adds Claude Opus 4.7 support in LLM Gateway for stronger reasoning, coding, and multi-step tasks.
Claude Opus 4.7 is now available through LLM Gateway. Opus 4.7 is Anthropic's most intelligent model yet — the latest in the Claude family, pushing the frontier on reasoning, coding, and complex multi-step tasks.
To use it, update the
model
parameter in your LLM Gateway request:
"model": "claude-opus-4-7"Available now for all LLM Gateway users.
Original source - Apr 2, 2026
- Date parsed from source:Apr 2, 2026
- First seen by Releasebot:Apr 2, 2026
Universal-2 Language Improvements: Hebrew & Swedish
AssemblyAI improves Universal-2 transcription accuracy for Hebrew and Swedish with automatic live updates for all users.
Universal-2 transcription accuracy has improved significantly for Hebrew and Swedish, with word error rates reduced by 37% and 47% respectively. No changes to your integration required — the improvements are live automatically for all users.
AssemblyAI's Universal speech model delivers industry-leading accuracy across dozens of languages, with continuous improvements rolling out automatically.
Original source - Mar 25, 2026
- Date parsed from source:Mar 25, 2026
- First seen by Releasebot:Apr 2, 2026
LLM Gateway: Automatic Model Fallbacks
AssemblyAI adds automatic model fallbacks to LLM Gateway, bringing built-in resilience against model failures without changing integrations. The public beta lets users retry with fallback models or the same model after 500ms, with configurable fallback depth and retry behavior.
LLM Gateway now supports automatic model fallbacks, giving your application resilience against model failures without changing your integration. If a model returns a server error, the Gateway will automatically retry with a fallback — or retry the same model after 500ms by default.
This is available now in Public Beta for all LLM Gateway users.
How to use it
Add a
fallbacksarray and optionalfallback_configto your request. All fields from the original request are copied over to the fallback automatically — you only need to specify what you want to override.Simple fallback — fall back to a different model, inheriting all original parameters:
{ "model": "kimi-k2.5", "messages": [ { "role": "user", "content": "Summarize this meeting: ..." } ], "temperature": 0.2, "fallbacks": [ { "model": "claude-sonnet-4-6" } ] }Advanced fallback — override specific parameters when falling back (e.g., different prompt or temperature for a different model's behavior):
{ "model": "kimi-k2.5", "messages": [ { "role": "user", "content": "Summarize this meeting: ..." } ], "temperature": 0.2, "fallbacks": [ { "model": "claude-sonnet-4-6", "messages": [ { "role": "user", "content": "Summarize this meeting concisely, key info only: ..." } ], "temperature": 0.3 } ] }Fallback config options:
"fallback_config": { "depth": 1, "retry": true }By default, if no fallbacks are set, the API will automatically retry a failed request after 500ms. For more control, set
fallback_config.retryto false and implement your own exponential backoff.AssemblyAI's LLM Gateway gives you a single API to access leading models from every major provider — with built-in resilience, load balancing, and cost tracking.
Original source - Mar 25, 2026
- Date parsed from source:Mar 25, 2026
- First seen by Releasebot:Mar 26, 2026
Introducing Medical Mode: Purpose-built accuracy for medical terminology
AssemblyAI adds Medical Mode for Streaming Speech-to-Text, boosting accuracy for medical terminology like medications, procedures, conditions, and dosages. The new add-on is available now across Universal Streaming models and works with no pipeline changes.
Medical Mode is a new add-on for AssemblyAI's Streaming Speech-to-Text that improves transcription accuracy for medical terminology — including medication names, procedures, conditions, and dosages. Available now on Universal-3 RT Pro, Universal Streaming English, and Universal Streaming Multilingual.
What it does
Medical Mode applies a correction pass optimized for medical entity recognition, targeting terms that general-purpose ASR frequently gets wrong. It works alongside the base model's noise handling, accent robustness, and latency characteristics — no tradeoffs.
Why it exists
General-purpose ASR can achieve strong overall accuracy on clinical audio while still consistently misrecognizing medical terminology. Because most healthcare AI pipelines feed transcripts directly into LLMs for structured output generation — SOAP notes, discharge summaries, referral letters — transcription errors on medical entities propagate rather than attenuate. Medical Mode intercepts those errors before they enter the pipeline.
How to enable it
Set the
domain
connection parameter to
"medical-v1"
. No other changes to your existing pipeline are required.
Availability & pricing
- Available now on Universal-3, Universal-3 Pro Streaming, Universal Streaming English, and Universal Streaming Multilingual
- Supports English, Spanish, German, and French
- Billed as a separate add-on — see the
pricing page
for details
- HIPAA BAA, SOC 2 Type 2, ISO 27001:2022, PCI DSS v4.0 included
Resources
-
Learn more- Medical Mode docs
- Playground — test with your own audio
- Mar 17, 2026
- Date parsed from source:Mar 17, 2026
- First seen by Releasebot:Mar 25, 2026
AssemblyAI Skill for AI Coding Agents
AssemblyAI adds a new Skill for AI coding agents, giving Claude Code, Cursor, Codex, and other tools accurate, up-to-date guidance on its APIs, SDKs, and integrations. It helps agents build with current transcription, streaming, LLM Gateway, and voice agent workflows correctly.
The AssemblyAI Skill is now available for AI coding agents — giving Claude Code, Cursor, Codex, and other vibe-coding tools accurate, up-to-date knowledge of AssemblyAI's APIs, SDKs, and integrations out of the box.
LLM training data goes stale fast. Without the skill, coding agents default to deprecated AssemblyAI patterns: the old LeMUR API instead of the LLM Gateway, wrong auth headers, discontinued SDK usage, and no awareness of newer features like Universal-3 Pro Streaming or the voice agent framework integrations. The AssemblyAI Skill corrects all of that — and covers the full current API surface, from pre-recorded transcription to real-time streaming to LLM Gateway workflows.
In evals, agents using the skill scored 17/17 on correctness across transcription, voice agent, and LLM Gateway scenarios. Without it: 7/17. The biggest gains are in voice agent integrations and LLM Gateway usage, where agents otherwise have no training data for framework-specific patterns.
How to use it
- Install via Claude Code:
cp -r assemblyai ~/.claude/skills/for personal use, or
cp -r assemblyai .claude/skills/at the project level
- For Codex, copy the folder and reference
assemblyai/SKILL.mdin your
AGENTS.md- Cursor and Windsurf: add the
assemblyai/directory as project-level documentation
- Available now — free, open source, no API key required
AssemblyAI is the leading speech AI platform for developers — built for production with best-in-class accuracy, real-time streaming, and a full suite of audio intelligence features. The AssemblyAI Skill makes sure your coding agent builds with all of it correctly, every time.
Original source - Mar 11, 2026
- Date parsed from source:Mar 11, 2026
- First seen by Releasebot:Mar 12, 2026
LLM Gateway Now Available in the EU
AssemblyAI announces EU availability of LLM Gateway and Speech Understanding with full data residency. EU customers can run LLM inference while prompts and responses stay in the region, with Claude and Gemini models supported. A unified API for LLM and audio ships now with enterprise reliability and transparent pricing. EU endpoint available now.
LLM Gateway and Speech Understanding in the EU
LLM Gateway and Speech Understanding are now available in the EU. Customers can now run LLM inference with full EU data residency, opening the door for teams with strict data governance requirements—including those migrating from LeMUR.
EU regional availability means your prompts and responses never leave the European Union. This is especially valuable for healthcare, finance, and enterprise customers with compliance requirements. Currently, Claude and Gemini regional models are supported in the EU.
How to use it:
- Update your request URL to the EU endpoint:
https://llm-gateway.eu.assemblyai.com/v1/chat/completions - Available now for all customers — no beta access required
- See the Cloud Endpoints & Data Residency docs for full details
LLM Gateway gives you a single, unified API to run LLM inference and audio intelligence together — with enterprise-grade reliability, transparent pricing, and now the data residency controls your team requires.
Try LLM Gateway in the EU →
Original source - Mar 3, 2026
- Date parsed from source:Mar 3, 2026
- First seen by Releasebot:Mar 4, 2026
Universal-3-Pro Now Available for Streaming
Universal-3-Pro is now available for real-time streaming, delivering best-in-class accuracy with real-time speaker labeling and native code switching for multilingual transcription. Build real-time voice apps with LLM-style accuracy and low-latency streaming via AssemblyAI.
Universal-3-Pro streaming
Universal-3-Pro is now available for real-time streaming — bringing our most accurate speech model to live transcription for the first time. Developers building voice agents, live captioning tools, and real-time analytics pipelines can now combine Universal-3-Pro's state-of-the-art accuracy with the low latency of AssemblyAI's streaming API.
Universal-3-Pro streaming delivers three key capabilities that set it apart: best-in-class word error rates across streaming ASR benchmarks, real-time speaker labels to identify who is speaking at each turn, and superior entity detection for names, places, organizations, and specialized terminology — all in real time, not just in batch. And with built-in code switching, Universal-3-Pro handles multilingual audio natively, accurately transcribing speakers who move between languages mid-conversation.
Whether you're building voice agents that need to route conversations by speaker, transcription tools that must catch rare entities accurately, or global applications serving multilingual users, Universal-3-Pro for streaming gives you LLM-style accuracy at real-time speeds.
How to use it:
Set "speech_model": "universal-3-pro" in your WebSocket connection parameters
Code switching is enabled automatically — no additional configuration needed
Available now via the streaming endpoint for all users
Read the full documentation
AssemblyAI's Universal-Streaming API is the fastest way to build real-time voice applications — and with Universal-3-Pro, it's now the most accurate too.
Original source - Feb 26, 2026
- Date parsed from source:Feb 26, 2026
- First seen by Releasebot:Mar 3, 2026
Share Your Playground Transcripts
AssemblyAI Playground gains a share button that generates a live 90‑day link to your transcript output for easy collaboration. Share results in Slack or with teammates and clients without copy‑paste or exports. It’s the fastest no‑code way to test transcription and audio intelligence models with instant sharing.
The AssemblyAI Playground now has a share button. One click generates a shareable link to your transcript output that stays live for 90 days.
Whether you're dropping results into a Slack thread, looping in a teammate for a quick review, or showing a client what the output actually looks like before they integrate — you no longer need to copy-paste text or export anything. Just hit share and send the link.
The AssemblyAI Playground is the fastest way to test our transcription and audio intelligence models without writing a single line of code. Try different models, toggle features, and now share what you see instantly.
Original source - Feb 19, 2026
- Date parsed from source:Feb 19, 2026
- First seen by Releasebot:Feb 20, 2026
Claude Sonnet 4.6 now supported on LLM Gateway
Claude Sonnet 4.6 release
Claude Sonnet 4.6 is now available through LLM Gateway. Sonnet 4.6 is our most capable Sonnet model yet with frontier performance across coding, agents, and professional work at scale. With this model, every line of code, every agent task, every spreadsheet can be powered by near-Opus intelligence at Sonnet pricing.hnm
To use it, update the model parameter to claude-sonnet-4-6 in your LLM Gateway requests.
For more information, check out our docs here.
Original source - Feb 9, 2026
- Date parsed from source:Feb 9, 2026
- First seen by Releasebot:Feb 20, 2026
Claude Opus 4.5 and 4.6 now supported on LLM Gateway
Claude's Opus Models via LLM Gateway
Claude's most capable models are now available through LLM Gateway. Opus 4.5 and Opus 4.6 bring significant improvements in reasoning, coding, and instruction-following.
To use it, update the model parameter to
claude-opus-4-5-20250929orclaude-opus-4-6in your LLM Gateway requests.For more information, check out our docs here.
Original sourceclaude-opus-4-5-20250929 claude-opus-4-6 - Feb 3, 2026
- Date parsed from source:Feb 3, 2026
- First seen by Releasebot:Feb 9, 2026
Universal-3-Pro: Our Promptable Speech-to-Text Model
Universal-3-Pro launches as the most capable Voice AI model yet, delivering LLM style control over transcription. Users can steer outputs with natural prompts and keyterms, even in six languages, with verbatim options and non speech tagging. Available now via the /v2/transcript API.
Universal-3-Pro release
We've released Universal-3-Pro, our most powerful Voice AI model yet—designed to give you LLM-style control over transcription output for the first time.
Unlike traditional ASR models that limit you to basic keyterm prompting or fixed output styles, Universal-3-Pro lets you progressively layer instructions to steer transcription behavior. Need verbatim output with filler words? Medical terminology with accurate dosages? Speaker labels by role? Code-switching between English and Spanish? You can design one robust prompt and apply it consistently across thousands of calls, getting workflow-ready outputs instead of brittle workarounds.
Out of the box, Universal-3-Pro outperforms all ASR models on accuracy, especially for entities and rare words. But the real power is in the prompting: natural language prompts up to 1,500 words for context and style, keyterms prompting for up to 1,000 specialized terms, built-in code switching across 6 languages, verbatim transcription controls for disfluencies and stutters, and audio tags for non-speech events like laughter, music, and beeps.How to use it:
- Set "speech_models": ["universal-3-pro", "universal"] with "language_detection": true for automatic routing and 99-language coverage
- Use prompt for natural language instructions and keyterms_prompt for boosting rare words (up to 1,000 terms, 6 words each)
- Available now via the /v2/transcript endpoint
- Read the full documentation
Universal-3-Pro represents a fundamental shift in what's possible with speech-to-text: true controllability that rivals human transcription quality, with the consistency and scale of an API.
Original source
Try Universal-3-Pro → - Jan 28, 2026
- Date parsed from source:Jan 28, 2026
- First seen by Releasebot:Jan 29, 2026
Improved Speaker Diarization for Short Audio
Speaker diarization is now more accurate for audio files under 2 minutes, with a 19% improvement in speaker count prediction and 6% improvement in cpWER. No changes required—this improvement is live for all users automatically.
Original source - Jan 20, 2026
- Date parsed from source:Jan 20, 2026
- First seen by Releasebot:Jan 29, 2026
Global Edge Routing & Data Zone Endpoints for Streaming Speech-to-Text
New streaming endpoints give you control over latency and data residency. Edge Routing slashes latency by routing to nearest region, while Data Zone Routing keeps audio data inside US or EU. Update your WebSocket URL to switch, with the default endpoint unchanged.
Streaming endpoints overview
We've launched new streaming endpoints that give you control over latency optimization and data residency. Choose the endpoint that best fits your application's requirements—whether that's achieving the lowest possible latency or ensuring your audio data stays within a specific geographic region.
Edge Routing (streaming.edge.assemblyai.com)
automatically routes requests to the nearest available region, minimizing latency for real-time transcription. With infrastructure in Oregon, Virginia, and Ireland, this endpoint delivers our best-in-class streaming performance regardless of where your users are located.
Data Zone Routing (streaming.us.assemblyai.com and streaming.eu.assemblyai.com)
guarantees your data never leaves the specified region. This is designed for organizations with strict data residency and governance requirements—your audio and transcription data will remain entirely within the US or EU, respectively.
How to use it:
- wss://streaming.edge.assemblyai.com/v3/ws — Lowest latency
- wss://streaming.us.assemblyai.com/v3/ws — US data residency
- wss://streaming.eu.assemblyai.com/v3/ws — EU data residency
The default endpoint (streaming.assemblyai.com) remains unchanged.
Original source
Curated by the Releasebot team
Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.
Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.
Similar to AssemblyAI with recent updates:
- xAI release notes69 release notes · Latest May 6, 2026
- Anthropic release notes559 release notes · Latest May 13, 2026
- Apple release notes119 release notes · Latest May 5, 2026
- Eleven Labs release notes52 release notes · Latest May 13, 2026
- Perplexity release notes24 release notes · Latest May 11, 2026
- 1Password release notes177 release notes · Latest Apr 21, 2026