Hume Release Notes
Last updated: Dec 23, 2025
- December 2025
- No date parsed from source.
- First seen by Releasebot:Dec 23, 2025
EVI API improvements
Added support for OpenAI’s GPT-5, GPT-5-mini, and GPT-5-nano models as a supplemental LLM options.
Original source Report a problem - December 2025
- No date parsed from source.
- First seen by Releasebot:Dec 23, 2025
EVI API improvements
New supplemental LLMs are now supported for all EVI versions:
- Claude Sonnet 4 (Anthropic)
- Llama 4 Maverick (SambaNova)
- Qwen3 32B (SambaNova)
- DeepSeek R1-Distill (Llama 3.3 70B Instruct, via SambaNova)
- Kimi K2 (Groq)
All of your release notes in one feed
Join Releasebot and get updates from Hume and hundreds of other software products.
- Nov 14, 2025
- Date parsed from source:Nov 14, 2025
- First seen by Releasebot:Dec 23, 2025
November 14, 2025
EVI API additions
Added support for a new SESSION_SETTINGS chat event in the EVI chat history API. When you fetch chat events via /v0/evi/chats/:id, the response now includes entries that indicate when system settings were updated and which settings were applied.
Original source Report a problem - Nov 7, 2025
- Date parsed from source:Nov 7, 2025
- First seen by Releasebot:Dec 23, 2025
November 7, 2025
Voice conversion and secure EVI enhancements expand the platform with new endpoints to convert speech to a target voice, manage active chats over WebSocket, and handle tool calls via a webhook. This update signals practical, user-facing API improvements.
TTS API additions
Introduced voice conversion endpoints. Send speech, specify a voice, and receive audio converted to that target voice.
- POST /v0/tts/voice_conversion/json: JSON response with audio and metadata.
- POST /v0/tts/voice_conversion/file: Audio file response.
EVI API additions
Added a control plane API for EVI. Perform secure server-side actions and connect to active chats. See the Control Plane guide.
- POST /v0/evi/chat/:chat_id/send: Send a message to an active chat.
- WSS /v0/evi/chat/:chat_id/connect: Connect to an active chat over WebSocket.
Added a tool call webhook event. Subscribe to tool calls to know when to invoke your tool, then send the tool response back to the chat using the control plane.
Original source Report a problem - Oct 24, 2025
- Date parsed from source:Oct 24, 2025
- First seen by Releasebot:Feb 18, 2026
AudioStack × Hume: Professional Audio for Creatives
AudioStack expands its AI audio production suite with Hume’s expressive voices, boosting speed, consistency, and emotional depth for ads, podcasts, and branded content. The integration enables scalable, natural sounding output across markets and languages while reducing costs and raising creative quality.
About AudioStack
AudioStack is an enterprise AI audio production platform trusted by global creative teams at Publicis, Omnicom, iHeartMedia, Dentsu, and more. Their AI-driven production suite empowers agencies, publishers, AdTech platforms, and brands to create broadcast-ready audio content 10 times faster at a fraction of traditional costs—reducing production expenses by up to 80% while scaling effortlessly across markets and languages.
Expanding Audiostack’s Voice Library with Hume
AudioStack offers a comprehensive voice library for audio advertisements, podcasts, and branded content. As they continue to grow their voice offerings, they're integrating Hume's emotionally intelligent voices to meet two core demands of creative teams:
Consistent Stability
Enterprise content generation requires voices that perform reliably across thousands of productions. Hume's voices deliver consistent quality and pronunciation, ensuring brand messaging remains clear and professional, whether creating one ad or thousands of dynamic variations.Natural Expressiveness
Generic TTS voices often sound flat or robotic—a dealbreaker for agencies creating audio that needs to engage audiences. Hume's voices bring genuine emotional depth, helping audio content feel authentic, engaging, and human.
By adding Hume’s expressive voices to their platform, AudioStack enables creative teams and advertisers to produce high-quality, emotionally resonant audio at scale.
For more information on how empathic AI can enhance your digital solutions, contact Hume AI.
Original source Report a problem - Oct 21, 2025
- Date parsed from source:Oct 21, 2025
- First seen by Releasebot:Feb 18, 2026
Creating immersive avatar experiences with Render Foundry
Render Foundry unveils an immersive Babe Ruth simulator built with Hume AI voice cloning, blending Unreal Engine storytelling with authentic, warm dialogue. The experience lets visitors talk with Babe Ruth, syncing likeness with audio for a lifelike museum interaction.
About Render Foundry
Render Foundry specializes in creating immersive experiences, from interactive museum installations to digital twins of entire campuses. Led by Shane Boyce and Josh Harwell, their team combines Unreal Engine expertise with cutting-edge storytelling to blur the line between reality and simulation.
When they set out to create an interactive Babe Ruth experience, they needed a voice that could do the impossible: bring him back to life with authenticity, warmth, and the personality that made him a legend.
Watch the Experience
Using Hume's custom voice cloning technology, Render Foundry created a Babe Ruth simulator that feels emotionally authentic. Hume captured the tonal qualities, cadence, and personality of the baseball icon, allowing visitors to have natural, engaging conversations with one of sports' most beloved figures.
The result is an experience that transcends typical museum exhibits. Visitors not only learn about Babe Ruth, but also connect with him. Render Foundry created something truly special by simulating Babe’s likeness and syncing the audio and the visual, making this experience one-of-a-kind.
Josh Harwell, the Creative Director at Render Foundry, says,
“We’re excited to offer these curated experiences. It’s fun to watch clients interact with our characters as they are brought to life. Whether it’s a historical figure, a mascot, or a brand ambassador, Hume helps us deliver a solution that humanizes the responses of AI.”
For more information on how empathic AI can enhance your digital solutions, contact Hume AI.
Original source Report a problem - Oct 14, 2025
- Date parsed from source:Oct 14, 2025
- First seen by Releasebot:Feb 18, 2026
Revelum × Hume: Detecting Voice Fraud in Real-Time
Revelum launches an AI-native security platform that stops deepfake fraud in real time with call risk analysis, live deepfake detection, and precise timestamps. A strategic partnership with Hume AI speeds up resilient detection and responsible AI use, demonstrated by a real-time fraud scenario.
Revelum
Revelum is an AI-native security platform that protects institutions from deepfake impersonations and fraud in real time. Founded by Enrique Barco in 2025, Revelum provides turn-key solutions and developer-friendly APIs to safeguard institutions from emerging threats in the era of AI-driven fraud.
Revelum’s strategic partnership with Hume AI ensures their detection systems can identify even the most advanced synthetic voices, including Hume's own Empathic Voice Interface (EVI).
The Demo: AI-Powered Fraud in Action
In a recent demo, Revelum showcased how attackers use AI voice agents to attempt account takeovers. The scenario: "Jake" calls customer support, claiming an AI assistant accidentally changed and deleted his password. The EVI-powered voice sounds natural, but Revelum's technology instantly flags the deepfake.
In real-time, Revelum’s platform provides:
- A call risk assessment analysis
- Real-time deepfake detection counts
- Precise timestamps of synthetic voice segments
The customer service agent receives an alert and initiates callback verification—stopping the attack immediately.
Hume’s Partnership with Revelum
Our collaboration with Revelum creates a critical feedback loop for responsible AI development:
- Early Access to EVI: Revelum trains their models on Hume's cutting-edge voice technology, ensuring detection capabilities stay ahead of emerging threats before they reach malicious actors.
- Continuous Refinement: As Hume's emotionally intelligent voices become more sophisticated, Revelum's detection algorithms evolve in parallel.
Revelum founder Enrique notes:
"By partnering with Hume, we’re taking a vital step toward building technology that anticipates — not just reacts to — the evolving tactics of bad actors seeking to misuse powerful models. Together, we’re staying one step ahead in ensuring generative AI is used responsibly."
For more information on how empathic AI can enhance your digital solutions, contact Hume AI.
Original source Report a problem - Oct 3, 2025
- Date parsed from source:Oct 3, 2025
- First seen by Releasebot:Dec 23, 2025
October 3, 2025
Octave 2 upgrades delivered across TTS and EVI: HTTP and WebSocket endpoints now support Octave 2 with timestamps at word and phoneme levels, plus EVI 4-mini enables multilingual TTS via Octave 2 with your chosen LLM.
TTS API improvements
Octave 2 is now available for use in TTS endpoints.
For HTTP endpoints, specify "version": "2" in your request body. (reference)
For the WebSocket endpoint, specify version=2 in the query parameters of the handshake request. (reference)Word and phoneme level timestamps are now supported for TTS APIs. See our Timestamps Guide to learn more.
EVI API improvements
EVI version 4-mini is now available. This version enables the use of Octave 2 for TTS alongside a supplemental LLM of your choosing, bringing Octave 2’s multilingual capabilities to EVI. Specify "version": "4-mini" in your EVI config version to use it.
Original source Report a problem - Oct 1, 2025
- Date parsed from source:Oct 1, 2025
- First seen by Releasebot:Dec 23, 2025
Octave 2: next-generation multilingual voice AI
Octave 2 debuts as a hyperreal voice AI with 11 languages, sub 200ms latency, and new voice conversion plus direct phoneme editing. Faster, cheaper, and available now on our platform and API for early access.
Octave 2
- More deeply understands the emotional tone of speech.
- Extends our text-to-speech system to 11 languages.
- Is 40% faster and more efficient, generating audio in under 200ms.
- Offers new first-of-their-kind features for a speech-language model, including voice conversion and direct phoneme editing.
- Pronounces uncommon words, repeated words, numbers, and symbols more reliably.
- Is half the price of Octave 1.
A speech-language model is a state-of-the-art AI model trained to understand and synthesize both language and speech. Unlike traditional TTS models, it understands how the script informs the tune, rhythm, and timbre of acting, inferring when to whisper secrets, shout triumphantly, or calmly explain a fact. Understanding these aspects of speech also allows it to reproduce the personality, and not just vocal timbre, of any speaker.
With Octave 2, we’ve taken these capabilities a step further.
Hyperrealistic voice AI in 11 languages
Octave 2 extends our next-generation voice AI to 11 languages: Arabic, English, French, German, Hindi, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.
Example Japanese-language generation from Octave 2:
「ハハハ、えっと、これがフォーマルなイベントだとは知らなかった。だから、えっと、バットマンのパジャマを着ているんだ。あなたの家ではカジュアルなイベントだと思っていたんだけど。すごく着飾っていない気がする。」Example Korean-language generation from Octave 2:
"내가 이걸 얼마나 오랫동안 했는지 알아? 그런데 네가 그냥. 삭제했어? 내 인생의 3개월이 순식간에 사라졌어. 네가 다시 확인도 안 해줬으니까!"Both of these voices were created with instant cloning, each using a 15-second audio recording of a native speaker's voice. When used to generate speech in a different language, Octave 2 predicts the speaker's accent. For instance, this is an English-language sample generated using the Japanese voice.
Octave 2 is becoming proficient in other languages, too; we will be announcing support for at least 20 languages in the coming months.
High quality at low latency
Octave 2 is the fastest and most efficient model of its kind, returning responses in under 200ms.
This was achieved without trading quality for latency. Instead, we deployed Octave 2 on some of the world’s most advanced chips for LLM inference. Working closely with Sambanova, we developed a new inference stack specific to Octave 2’s new speech-language model architecture.
Octave 2 isn’t just fast, it’s also efficient. We’re offering it at half the price of Octave 1. With dedicated deployments, this can be reduced to under a cent per minute of audio. This efficiency allows Octave 2 to power large-scale applications in entertainment, gaming, customer service, and more.
Voice conversion
With Octave 2, we’ve been working on two novel capabilities for a speech-language model: realistic voice conversion and direct phoneme editing.
With voice conversion, Octave 2 can exchange one voice for another while freezing the phonetic qualities and timing of the spoken utterance. This is ideal for use cases that require actors to stand in for other actors, such as dubbing in a new language with the original actor’s voice, or making precise human touch-ups to AI voiceovers.
For instance, when we prompt the model with the following speech:
And the following target voice:
The model generates the following converted speech:
Phoneme Editing
We're also exploring a new phoneme editing capability, where minute adjustments can be made to the timing and pronunciation of speech. This enables support for custom pronunciation of names, manipulation of word emphasis, and more.
For instance, take this classic film quote, recreated with voice conversion:
Using phoneme editing, we can alter the pronunciation of words in the original quote. Here's an example:
We've created a new word, "leviaso," out of the phonemes present in the original quote. This kind of granular phoneme replication and editing would have proven difficult, if not impossible, with text input alone.
Voice conversion and phoneme editing will be available soon on our platform.
Build conversational experiences with EVI 4 mini
Finally, we're launching EVI 4 mini, which brings all of the capabilities of Octave 2 to our speech-to-speech API. Now, you can build faster, smoother interactive experiences in 11 languages. For example, we built a translator app using EVI 4 mini with just a few voice samples and a prompt.
EVI 4 mini doesn't yet generate its own language natively, so you'll need to pair it with an external LLM through our API until we launch the full version.
Access Octave 2 and EVI 4 mini
Today, we're rolling out access to Octave 2 on our text-to-speech playground and API, and to EVI 4 mini on our speech-to-speech playground and API.
Soon we'll be releasing more evaluations, more languages, and access to voice conversion and phoneme editing.
In the meantime, we're excited to see what you create!
Original source Report a problem - Sep 24, 2025
- Date parsed from source:Sep 24, 2025
- First seen by Releasebot:Dec 23, 2025
Hume AI powers conversational learning with Coconote
Coconote adds voice chat powered by Hume EVI to turn notes into interactive conversations with natural questions, explanations, and quiz prompts. The integration delivers seamless dialogue, contextual note referencing, and emotionally aware responses to boost study engagement.
About Coconote
Coconote is a leading AI note-taker, consistently ranking in the Top 50 Education apps on Apple's App Store. They help transform lectures and meetings into organized notes, transcripts, and study materials, supporting over 100 languages and multiple content formats including audio, video, and PDF conversion.
Coconote is tailor-made for neuro-diverse students and professionals including those with ADHD, ASD, dyslexia, and auditory processing issues.
Enhancing Student Engagement
While traditional note-taking apps require students to manually scroll and search through content, Coconote is creating interactive study experiences through conversational AI.
Coconote’s voice chat feature, powered by Hume's EVI, helps users transform static notes into dynamic conversations. Students can:
- Ask natural questions about their lecture content,
- Receive contextual explanations referencing specific notes, and
- Engage in quiz-style conversations for active learning—all through natural voice interaction.
EVI powers the "Voice Chat" feature on Coconote
For example, a student might ask, "What were the key differences between mitosis and meiosis from today's biology lecture?" and receive an immediate response from EVI that not only references their notes, but also discusses follow-up questions and tangential topics (e.g., cytokinesis).
Technical Integration
EVI's advanced capabilities make it an ideal fit for Coconote. The platform delivers conversational latency for natural dialogue flow, and uses intelligent end-of-turn detection based on vocal cues. Most importantly, EVI provides emotionally aware responses that adapt to students’ attitudes and confidence levels.
From a development perspective, the integration was seamless:
- Clean API integration with existing Coconote infrastructure
- Custom system prompts aligned with Coconote's supportive brand voice
- Chat history capabilities for reviewing past study sessions
The result enables students to have meaningful conversations with their notes while maintaining the technical sophistication needed for educational applications. Users can test the EVI integration in Coconote directly at https://coconote.app/.
For more information on how empathic AI can enhance your digital solutions, please contact Hume AI.
Original source Report a problem