Hume Release Notes
36 release notes curated from 31 sources by the Releasebot Team. Last updated: May 16, 2026
- May 2026
- No date parsed from source.
- First seen by Releasebot:May 16, 2026
TTS API bug fixes
Hume fixes a bug that caused duplicate interleaved TTS audio and distorted output.
Fixed a bug where duplicate interleaved audio was included in TTS audio output.
This resolves an issue where audio chunks could be duplicated and interleaved, resulting in distorted output.
Original source - May 15, 2026
- Date parsed from source:May 15, 2026
- First seen by Releasebot:May 16, 2026
May 15, 2026
Hume adds an experimental temperature parameter to its TTS API for more varied or consistent speech generation.
TTS API additions
Added an experimental temperature parameter to TTS endpoints. Controls sampling temperature for speech generation. Higher values increase variation; lower values increase consistency.
Original source All of your release notes in one feed
Join Releasebot and get updates from Hume and hundreds of other software products.
- Apr 10, 2026
- Date parsed from source:Apr 10, 2026
- First seen by Releasebot:Apr 11, 2026
April 10, 2026
Hume adds configurable turn detection and interruption settings to EVI configs, giving users finer control over turn-taking, speech detection, and interruption behavior on a per-config basis.
EVI API additions
Added configurable turn detection and interruption settings to EVI configs. You can now control how EVI handles turn-taking and interruptions on a per-config basis.
- turn_detection.end_of_turn_silence_ms: How long EVI waits after speech ends before committing a turn (500-3000ms, default 800ms).
- turn_detection.speech_detection_threshold: Sensitivity of voice activity detection (0.0-1.0, default 0.5).
- turn_detection.prefix_padding_ms: Audio padding before detected speech (default 300ms).
- interruption.min_interruption_ms: Minimum speech duration before EVI can be interrupted (50-2000ms, default 800ms).
- Mar 10, 2026
- Date parsed from source:Mar 10, 2026
- First seen by Releasebot:Mar 10, 2026
Opensourcing TADA: Fast, Reliable Speech Generation Through Text-Acoustic Synchronization
Hume releases TADA, a groundbreaking Text-Acoustic Dual Alignment for fast, low-hallucination LLM based TTS. Open-sourced now with 1B and 3B models and full audio tokenizer, enabling on-device deployment and real-time voice with one-to-one text audio mapping. Available on HuggingFace and GitHub for research and development.
Approach
The future of voice AI hinges on sounding natural, fast, expressive, and free of quirks like hallucinated words or skipped content. Today's LLM-based TTS systems are forced to choose between speed, quality, and reliability because of a fundamental mismatch between how text and audio are represented inside language models.
TADA (Text-Acoustic Dual Alignment) resolves that mismatch with a novel tokenization schema that synchronizes text and speech one-to-one. The result: the fastest LLM-based TTS system available, with competitive voice quality, virtually zero content hallucinations, and a footprint light enough for on-device deployment.
Hume AI is open-sourcing TADA to accelerate progress toward efficient, reliable voice generation. Code and pre-trained models are available now.
For input audio, an encoder paired with an aligner extracts acoustic features from the audio segment corresponding to each text token. For output audio, the LLM's final hidden state serves as a conditioning vector for a flow-matching head, which generates acoustic features that are then decoded into audio and fed back into the model.
Since each LLM step corresponds to exactly one text token and one acoustic representation, TADA generates speech faster and with less computational effort. And because the architecture enforces a strict one-to-one mapping between text and audio, the model cannot skip or hallucinate content by construction.
Evaluation
Hallucination Rate
SAMPLES WITH CER > 0.15, SIGNALING SKIPPED WORDS, INSERTED CONTENT, OR UNINTELLIGIBLE SPEECH. OUT OF 1,088 SAMPLES.
- FireRedTTS-2 41
- Higgs Audio v2 24
- VibeVoice 1.5B 17
- TADA-3B 0
- TADA-1B 0
Speed
TADA generates speech at a real-time factor (RTF) of 0.09 — more than 5x faster than similar grade LLM-based TTS systems. This is possible because TADA operates at just 2–3 frames (tokens) per second of audio, compared to 12.5–75 tokens per second in other approaches.
Hallucination
Our model was trained on large scale, in-the-wild data, without post-training, and achieves the same reliability as models trained on smaller curated datasets. We measured hallucination rate by flagging any sample with a character error rate (CER) above 0.15 — a threshold that captures unintelligible speech, skipped text, and inserted content. In the 1000+ test samples from LibriTTSR, TADA produced zero hallucinations.
Voice Quality
Results on SEED-TTS-EVAL and LIBRITTSR-EVAL show that TADA achieves reliability comparable to Index-TTS — one of the few systems with similarly low hallucination rates — while being trained on a larger, less-curated dataset.
In human evaluation on expressive, long-form speech (EARS dataset), TADA scored 4.18/5.0 on speaker similarity and 3.78/5.0 on naturalness, placing second overall — ahead of several systems trained on significantly more data.
Potential Applications
On-device deployment. TADA is lightweight enough to run on mobile phones and edge devices without requiring cloud inference. For device manufacturers and app developers building voice interfaces, this means lower latency, better privacy, and no API dependency.
Long-form and conversational speech. TADA's synchronous tokenization is dramatically more context-efficient than existing approaches. Where a conventional system exhausts a 2048-token context window in about 70 seconds of audio, TADA can accommodate roughly 700 seconds in the same budget. This opens the door to long-form narration, extended dialogue, and multi-turn voice interactions.
Production reliability. Zero hallucinations in our tests suggests fewer edge cases to catch, fewer customer complaints, and less post-processing overhead in the product. This makes TADA well-suited for deploying voice in regulated or sensitive environments like healthcare, finance, and education.
Limitations and Future Work
Long-form degradation. While the model supports more than 10 minutes of context, we noticed occasional cases of speaker drift during long generations. Our online rejection sampling strategy reduces this significantly, but it's not fully resolved. We suggest resetting the context as an intermediate workaround.
The modality gap. When the model generates text alongside speech, language quality drops relative to text-only mode. We introduce Speech Free Guidance (SFG), a technique that blends logits from text-only and text-speech inference modes to help close this gap, but more work is required.
Use-cases. The model is only pre-trained on speech continuation; further fine-tuning is required for assistant scenarios. Get in touch to inquire about Hume's extensive library of fine-tuning data.
Scale. The current release covers English and seven additional languages, so there's clear room to expand. We're training larger models with broader language coverage with Hume AI data.
We're releasing TADA because we believe this architecture opens a productive direction for the field, and we want to accelerate progress. We invite researchers and developers to build on this work — whether that means extending the tokenizer to new modalities, solving the long-context problem, or adapting the framework for new applications.
Get Started
TADA is available now under an open-source license. We're releasing 1B and 3B parameter Llama-based models and the full audio tokenizer and decoder.
1B (English):
huggingface.co/HumeAI/tada-1b
3B (multilingual):
huggingface.co/HumeAI/tada-3b-ml
Demo:
huggingface.co/spaces/HumeAI/tada
GitHub:
github.com/HumeAI/tada
TADA was developed by Trung Dang, Sharath Rao, Ananya Gupta, Christopher Gagne, Panagiotis Tzirakis, Alice Baird, Jakub Piotr Cłapa, Peter Chin, and Alan Cowen at Hume AI.
Hume builds voice AI research infrastructure for frontier labs and AI-first enterprises. If you're working on voice models and need high-quality training data, evaluation systems, or reinforcement learning infrastructure, get in touch at
Original source - Feb 27, 2026
- Date parsed from source:Feb 27, 2026
- First seen by Releasebot:Feb 28, 2026
- Modified by Releasebot:May 16, 2026
February 27, 2026
Hume adds EVI API support for new LLM models and zero prompt expansion control.
EVI API additions
Added support for new supplemental LLM models: claude-opus-4-6, gpt-5.1, gpt-5.1-priority, gpt-5.2, gpt-5.2-priority.
Added support for zero prompt expansion. You can now set prompt_expansion to ZERO when configuring an external LLM, disabling automatic prompt expansion and giving you full control over the system prompt.
Original source - December 2025
- No date parsed from source.
- First seen by Releasebot:Dec 23, 2025
EVI API improvements
Added support for OpenAI’s GPT-5, GPT-5-mini, and GPT-5-nano models as a supplemental LLM options.
Original source - December 2025
- No date parsed from source.
- First seen by Releasebot:Dec 23, 2025
EVI API improvements
New supplemental LLMs are now supported for all EVI versions:
- Claude Sonnet 4 (Anthropic)
- Llama 4 Maverick (SambaNova)
- Qwen3 32B (SambaNova)
- DeepSeek R1-Distill (Llama 3.3 70B Instruct, via SambaNova)
- Kimi K2 (Groq)
- Nov 14, 2025
- Date parsed from source:Nov 14, 2025
- First seen by Releasebot:Dec 23, 2025
November 14, 2025
EVI API additions
Added support for a new SESSION_SETTINGS chat event in the EVI chat history API. When you fetch chat events via /v0/evi/chats/:id, the response now includes entries that indicate when system settings were updated and which settings were applied.
Original source - Nov 7, 2025
- Date parsed from source:Nov 7, 2025
- First seen by Releasebot:Dec 23, 2025
November 7, 2025
Voice conversion and secure EVI enhancements expand the platform with new endpoints to convert speech to a target voice, manage active chats over WebSocket, and handle tool calls via a webhook. This update signals practical, user-facing API improvements.
TTS API additions
Introduced voice conversion endpoints. Send speech, specify a voice, and receive audio converted to that target voice.
- POST /v0/tts/voice_conversion/json: JSON response with audio and metadata.
- POST /v0/tts/voice_conversion/file: Audio file response.
EVI API additions
Added a control plane API for EVI. Perform secure server-side actions and connect to active chats. See the Control Plane guide.
- POST /v0/evi/chat/:chat_id/send: Send a message to an active chat.
- WSS /v0/evi/chat/:chat_id/connect: Connect to an active chat over WebSocket.
Added a tool call webhook event. Subscribe to tool calls to know when to invoke your tool, then send the tool response back to the chat using the control plane.
Original source - Oct 24, 2025
- Date parsed from source:Oct 24, 2025
- First seen by Releasebot:Feb 18, 2026
AudioStack × Hume: Professional Audio for Creatives
AudioStack expands its AI audio production suite with Hume’s expressive voices, boosting speed, consistency, and emotional depth for ads, podcasts, and branded content. The integration enables scalable, natural sounding output across markets and languages while reducing costs and raising creative quality.
About AudioStack
AudioStack is an enterprise AI audio production platform trusted by global creative teams at Publicis, Omnicom, iHeartMedia, Dentsu, and more. Their AI-driven production suite empowers agencies, publishers, AdTech platforms, and brands to create broadcast-ready audio content 10 times faster at a fraction of traditional costs—reducing production expenses by up to 80% while scaling effortlessly across markets and languages.
Expanding Audiostack’s Voice Library with Hume
AudioStack offers a comprehensive voice library for audio advertisements, podcasts, and branded content. As they continue to grow their voice offerings, they're integrating Hume's emotionally intelligent voices to meet two core demands of creative teams:
Consistent Stability
Enterprise content generation requires voices that perform reliably across thousands of productions. Hume's voices deliver consistent quality and pronunciation, ensuring brand messaging remains clear and professional, whether creating one ad or thousands of dynamic variations.Natural Expressiveness
Generic TTS voices often sound flat or robotic—a dealbreaker for agencies creating audio that needs to engage audiences. Hume's voices bring genuine emotional depth, helping audio content feel authentic, engaging, and human.
By adding Hume’s expressive voices to their platform, AudioStack enables creative teams and advertisers to produce high-quality, emotionally resonant audio at scale.
For more information on how empathic AI can enhance your digital solutions, contact Hume AI.
Original source - Oct 21, 2025
- Date parsed from source:Oct 21, 2025
- First seen by Releasebot:Feb 18, 2026
Creating immersive avatar experiences with Render Foundry
Render Foundry unveils an immersive Babe Ruth simulator built with Hume AI voice cloning, blending Unreal Engine storytelling with authentic, warm dialogue. The experience lets visitors talk with Babe Ruth, syncing likeness with audio for a lifelike museum interaction.
About Render Foundry
Render Foundry specializes in creating immersive experiences, from interactive museum installations to digital twins of entire campuses. Led by Shane Boyce and Josh Harwell, their team combines Unreal Engine expertise with cutting-edge storytelling to blur the line between reality and simulation.
When they set out to create an interactive Babe Ruth experience, they needed a voice that could do the impossible: bring him back to life with authenticity, warmth, and the personality that made him a legend.
Watch the Experience
Using Hume's custom voice cloning technology, Render Foundry created a Babe Ruth simulator that feels emotionally authentic. Hume captured the tonal qualities, cadence, and personality of the baseball icon, allowing visitors to have natural, engaging conversations with one of sports' most beloved figures.
The result is an experience that transcends typical museum exhibits. Visitors not only learn about Babe Ruth, but also connect with him. Render Foundry created something truly special by simulating Babe’s likeness and syncing the audio and the visual, making this experience one-of-a-kind.
Josh Harwell, the Creative Director at Render Foundry, says,
“We’re excited to offer these curated experiences. It’s fun to watch clients interact with our characters as they are brought to life. Whether it’s a historical figure, a mascot, or a brand ambassador, Hume helps us deliver a solution that humanizes the responses of AI.”
For more information on how empathic AI can enhance your digital solutions, contact Hume AI.
Original source - Oct 14, 2025
- Date parsed from source:Oct 14, 2025
- First seen by Releasebot:Feb 18, 2026
Revelum × Hume: Detecting Voice Fraud in Real-Time
Revelum launches an AI-native security platform that stops deepfake fraud in real time with call risk analysis, live deepfake detection, and precise timestamps. A strategic partnership with Hume AI speeds up resilient detection and responsible AI use, demonstrated by a real-time fraud scenario.
Revelum
Revelum is an AI-native security platform that protects institutions from deepfake impersonations and fraud in real time. Founded by Enrique Barco in 2025, Revelum provides turn-key solutions and developer-friendly APIs to safeguard institutions from emerging threats in the era of AI-driven fraud.
Revelum’s strategic partnership with Hume AI ensures their detection systems can identify even the most advanced synthetic voices, including Hume's own Empathic Voice Interface (EVI).
The Demo: AI-Powered Fraud in Action
In a recent demo, Revelum showcased how attackers use AI voice agents to attempt account takeovers. The scenario: "Jake" calls customer support, claiming an AI assistant accidentally changed and deleted his password. The EVI-powered voice sounds natural, but Revelum's technology instantly flags the deepfake.
In real-time, Revelum’s platform provides:
- A call risk assessment analysis
- Real-time deepfake detection counts
- Precise timestamps of synthetic voice segments
The customer service agent receives an alert and initiates callback verification—stopping the attack immediately.
Hume’s Partnership with Revelum
Our collaboration with Revelum creates a critical feedback loop for responsible AI development:
- Early Access to EVI: Revelum trains their models on Hume's cutting-edge voice technology, ensuring detection capabilities stay ahead of emerging threats before they reach malicious actors.
- Continuous Refinement: As Hume's emotionally intelligent voices become more sophisticated, Revelum's detection algorithms evolve in parallel.
Revelum founder Enrique notes:
"By partnering with Hume, we’re taking a vital step toward building technology that anticipates — not just reacts to — the evolving tactics of bad actors seeking to misuse powerful models. Together, we’re staying one step ahead in ensuring generative AI is used responsibly."
For more information on how empathic AI can enhance your digital solutions, contact Hume AI.
Original source - Oct 3, 2025
- Date parsed from source:Oct 3, 2025
- First seen by Releasebot:Dec 23, 2025
October 3, 2025
Octave 2 upgrades delivered across TTS and EVI: HTTP and WebSocket endpoints now support Octave 2 with timestamps at word and phoneme levels, plus EVI 4-mini enables multilingual TTS via Octave 2 with your chosen LLM.
TTS API improvements
Octave 2 is now available for use in TTS endpoints.
For HTTP endpoints, specify "version": "2" in your request body. (reference)
For the WebSocket endpoint, specify version=2 in the query parameters of the handshake request. (reference)Word and phoneme level timestamps are now supported for TTS APIs. See our Timestamps Guide to learn more.
EVI API improvements
EVI version 4-mini is now available. This version enables the use of Octave 2 for TTS alongside a supplemental LLM of your choosing, bringing Octave 2’s multilingual capabilities to EVI. Specify "version": "4-mini" in your EVI config version to use it.
Original source - Oct 1, 2025
- Date parsed from source:Oct 1, 2025
- First seen by Releasebot:Dec 23, 2025
Octave 2: next-generation multilingual voice AI
Octave 2 debuts as a hyperreal voice AI with 11 languages, sub 200ms latency, and new voice conversion plus direct phoneme editing. Faster, cheaper, and available now on our platform and API for early access.
Octave 2
- More deeply understands the emotional tone of speech.
- Extends our text-to-speech system to 11 languages.
- Is 40% faster and more efficient, generating audio in under 200ms.
- Offers new first-of-their-kind features for a speech-language model, including voice conversion and direct phoneme editing.
- Pronounces uncommon words, repeated words, numbers, and symbols more reliably.
- Is half the price of Octave 1.
A speech-language model is a state-of-the-art AI model trained to understand and synthesize both language and speech. Unlike traditional TTS models, it understands how the script informs the tune, rhythm, and timbre of acting, inferring when to whisper secrets, shout triumphantly, or calmly explain a fact. Understanding these aspects of speech also allows it to reproduce the personality, and not just vocal timbre, of any speaker.
With Octave 2, we’ve taken these capabilities a step further.
Hyperrealistic voice AI in 11 languages
Octave 2 extends our next-generation voice AI to 11 languages: Arabic, English, French, German, Hindi, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.
Example Japanese-language generation from Octave 2:
「ハハハ、えっと、これがフォーマルなイベントだとは知らなかった。だから、えっと、バットマンのパジャマを着ているんだ。あなたの家ではカジュアルなイベントだと思っていたんだけど。すごく着飾っていない気がする。」Example Korean-language generation from Octave 2:
"내가 이걸 얼마나 오랫동안 했는지 알아? 그런데 네가 그냥. 삭제했어? 내 인생의 3개월이 순식간에 사라졌어. 네가 다시 확인도 안 해줬으니까!"Both of these voices were created with instant cloning, each using a 15-second audio recording of a native speaker's voice. When used to generate speech in a different language, Octave 2 predicts the speaker's accent. For instance, this is an English-language sample generated using the Japanese voice.
Octave 2 is becoming proficient in other languages, too; we will be announcing support for at least 20 languages in the coming months.
High quality at low latency
Octave 2 is the fastest and most efficient model of its kind, returning responses in under 200ms.
This was achieved without trading quality for latency. Instead, we deployed Octave 2 on some of the world’s most advanced chips for LLM inference. Working closely with Sambanova, we developed a new inference stack specific to Octave 2’s new speech-language model architecture.
Octave 2 isn’t just fast, it’s also efficient. We’re offering it at half the price of Octave 1. With dedicated deployments, this can be reduced to under a cent per minute of audio. This efficiency allows Octave 2 to power large-scale applications in entertainment, gaming, customer service, and more.
Voice conversion
With Octave 2, we’ve been working on two novel capabilities for a speech-language model: realistic voice conversion and direct phoneme editing.
With voice conversion, Octave 2 can exchange one voice for another while freezing the phonetic qualities and timing of the spoken utterance. This is ideal for use cases that require actors to stand in for other actors, such as dubbing in a new language with the original actor’s voice, or making precise human touch-ups to AI voiceovers.
For instance, when we prompt the model with the following speech:
And the following target voice:
The model generates the following converted speech:
Phoneme Editing
We're also exploring a new phoneme editing capability, where minute adjustments can be made to the timing and pronunciation of speech. This enables support for custom pronunciation of names, manipulation of word emphasis, and more.
For instance, take this classic film quote, recreated with voice conversion:
Using phoneme editing, we can alter the pronunciation of words in the original quote. Here's an example:
We've created a new word, "leviaso," out of the phonemes present in the original quote. This kind of granular phoneme replication and editing would have proven difficult, if not impossible, with text input alone.
Voice conversion and phoneme editing will be available soon on our platform.
Build conversational experiences with EVI 4 mini
Finally, we're launching EVI 4 mini, which brings all of the capabilities of Octave 2 to our speech-to-speech API. Now, you can build faster, smoother interactive experiences in 11 languages. For example, we built a translator app using EVI 4 mini with just a few voice samples and a prompt.
EVI 4 mini doesn't yet generate its own language natively, so you'll need to pair it with an external LLM through our API until we launch the full version.
Access Octave 2 and EVI 4 mini
Today, we're rolling out access to Octave 2 on our text-to-speech playground and API, and to EVI 4 mini on our speech-to-speech playground and API.
Soon we'll be releasing more evaluations, more languages, and access to voice conversion and phoneme editing.
In the meantime, we're excited to see what you create!
Original source - Sep 24, 2025
- Date parsed from source:Sep 24, 2025
- First seen by Releasebot:Dec 23, 2025
Hume AI powers conversational learning with Coconote
Coconote adds voice chat powered by Hume EVI to turn notes into interactive conversations with natural questions, explanations, and quiz prompts. The integration delivers seamless dialogue, contextual note referencing, and emotionally aware responses to boost study engagement.
About Coconote
Coconote is a leading AI note-taker, consistently ranking in the Top 50 Education apps on Apple's App Store. They help transform lectures and meetings into organized notes, transcripts, and study materials, supporting over 100 languages and multiple content formats including audio, video, and PDF conversion.
Coconote is tailor-made for neuro-diverse students and professionals including those with ADHD, ASD, dyslexia, and auditory processing issues.
Enhancing Student Engagement
While traditional note-taking apps require students to manually scroll and search through content, Coconote is creating interactive study experiences through conversational AI.
Coconote’s voice chat feature, powered by Hume's EVI, helps users transform static notes into dynamic conversations. Students can:
- Ask natural questions about their lecture content,
- Receive contextual explanations referencing specific notes, and
- Engage in quiz-style conversations for active learning—all through natural voice interaction.
EVI powers the "Voice Chat" feature on Coconote
For example, a student might ask, "What were the key differences between mitosis and meiosis from today's biology lecture?" and receive an immediate response from EVI that not only references their notes, but also discusses follow-up questions and tangential topics (e.g., cytokinesis).
Technical Integration
EVI's advanced capabilities make it an ideal fit for Coconote. The platform delivers conversational latency for natural dialogue flow, and uses intelligent end-of-turn detection based on vocal cues. Most importantly, EVI provides emotionally aware responses that adapt to students’ attitudes and confidence levels.
From a development perspective, the integration was seamless:
- Clean API integration with existing Coconote infrastructure
- Custom system prompts aligned with Coconote's supportive brand voice
- Chat history capabilities for reviewing past study sessions
The result enables students to have meaningful conversations with their notes while maintaining the technical sophistication needed for educational applications. Users can test the EVI integration in Coconote directly at https://coconote.app/.
For more information on how empathic AI can enhance your digital solutions, please contact Hume AI.
Original source
Curated by the Releasebot team
Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.
Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.
Similar to Hume with recent updates:
- xAI release notes74 release notes · Latest May 21, 2026
- Anthropic release notes574 release notes · Latest May 22, 2026
- Cursor release notes84 release notes · Latest May 20, 2026
- Eleven Labs release notes62 release notes · Latest May 18, 2026
- Perplexity release notes24 release notes · Latest May 11, 2026
- Mistral release notes82 release notes · Latest May 4, 2026