Hume Release Notes

Last updated: Mar 10, 2026

  • Mar 10, 2026
    • Date parsed from source:
      Mar 10, 2026
    • First seen by Releasebot:
      Mar 10, 2026
    Hume logo

    Hume

    Opensourcing TADA: Fast, Reliable Speech Generation Through Text-Acoustic Synchronization

    Hume releases TADA, a groundbreaking Text-Acoustic Dual Alignment for fast, low-hallucination LLM based TTS. Open-sourced now with 1B and 3B models and full audio tokenizer, enabling on-device deployment and real-time voice with one-to-one text audio mapping. Available on HuggingFace and GitHub for research and development.

    Approach

    The future of voice AI hinges on sounding natural, fast, expressive, and free of quirks like hallucinated words or skipped content. Today's LLM-based TTS systems are forced to choose between speed, quality, and reliability because of a fundamental mismatch between how text and audio are represented inside language models.

    TADA (Text-Acoustic Dual Alignment) resolves that mismatch with a novel tokenization schema that synchronizes text and speech one-to-one. The result: the fastest LLM-based TTS system available, with competitive voice quality, virtually zero content hallucinations, and a footprint light enough for on-device deployment.

    Hume AI is open-sourcing TADA to accelerate progress toward efficient, reliable voice generation. Code and pre-trained models are available now.

    For input audio, an encoder paired with an aligner extracts acoustic features from the audio segment corresponding to each text token. For output audio, the LLM's final hidden state serves as a conditioning vector for a flow-matching head, which generates acoustic features that are then decoded into audio and fed back into the model.

    Since each LLM step corresponds to exactly one text token and one acoustic representation, TADA generates speech faster and with less computational effort. And because the architecture enforces a strict one-to-one mapping between text and audio, the model cannot skip or hallucinate content by construction.

    Evaluation

    Hallucination Rate

    SAMPLES WITH CER > 0.15, SIGNALING SKIPPED WORDS, INSERTED CONTENT, OR UNINTELLIGIBLE SPEECH. OUT OF 1,088 SAMPLES.

    • FireRedTTS-2 41
    • Higgs Audio v2 24
    • VibeVoice 1.5B 17
    • TADA-3B 0
    • TADA-1B 0

    Speed

    TADA generates speech at a real-time factor (RTF) of 0.09 — more than 5x faster than similar grade LLM-based TTS systems. This is possible because TADA operates at just 2–3 frames (tokens) per second of audio, compared to 12.5–75 tokens per second in other approaches.

    Hallucination

    Our model was trained on large scale, in-the-wild data, without post-training, and achieves the same reliability as models trained on smaller curated datasets. We measured hallucination rate by flagging any sample with a character error rate (CER) above 0.15 — a threshold that captures unintelligible speech, skipped text, and inserted content. In the 1000+ test samples from LibriTTSR, TADA produced zero hallucinations.

    Voice Quality

    Results on SEED-TTS-EVAL and LIBRITTSR-EVAL show that TADA achieves reliability comparable to Index-TTS — one of the few systems with similarly low hallucination rates — while being trained on a larger, less-curated dataset.

    In human evaluation on expressive, long-form speech (EARS dataset), TADA scored 4.18/5.0 on speaker similarity and 3.78/5.0 on naturalness, placing second overall — ahead of several systems trained on significantly more data.

    Potential Applications

    On-device deployment. TADA is lightweight enough to run on mobile phones and edge devices without requiring cloud inference. For device manufacturers and app developers building voice interfaces, this means lower latency, better privacy, and no API dependency.

    Long-form and conversational speech. TADA's synchronous tokenization is dramatically more context-efficient than existing approaches. Where a conventional system exhausts a 2048-token context window in about 70 seconds of audio, TADA can accommodate roughly 700 seconds in the same budget. This opens the door to long-form narration, extended dialogue, and multi-turn voice interactions.

    Production reliability. Zero hallucinations in our tests suggests fewer edge cases to catch, fewer customer complaints, and less post-processing overhead in the product. This makes TADA well-suited for deploying voice in regulated or sensitive environments like healthcare, finance, and education.

    Limitations and Future Work

    Long-form degradation. While the model supports more than 10 minutes of context, we noticed occasional cases of speaker drift during long generations. Our online rejection sampling strategy reduces this significantly, but it's not fully resolved. We suggest resetting the context as an intermediate workaround.

    The modality gap. When the model generates text alongside speech, language quality drops relative to text-only mode. We introduce Speech Free Guidance (SFG), a technique that blends logits from text-only and text-speech inference modes to help close this gap, but more work is required.

    Use-cases. The model is only pre-trained on speech continuation; further fine-tuning is required for assistant scenarios. Get in touch to inquire about Hume's extensive library of fine-tuning data.

    Scale. The current release covers English and seven additional languages, so there's clear room to expand. We're training larger models with broader language coverage with Hume AI data.

    We're releasing TADA because we believe this architecture opens a productive direction for the field, and we want to accelerate progress. We invite researchers and developers to build on this work — whether that means extending the tokenizer to new modalities, solving the long-context problem, or adapting the framework for new applications.

    Get Started

    TADA is available now under an open-source license. We're releasing 1B and 3B parameter Llama-based models and the full audio tokenizer and decoder.

    1B (English):

    huggingface.co/HumeAI/tada-1b

    3B (multilingual):

    huggingface.co/HumeAI/tada-3b-ml

    Demo:

    huggingface.co/spaces/HumeAI/tada

    GitHub:

    github.com/HumeAI/tada

    TADA was developed by Trung Dang, Sharath Rao, Ananya Gupta, Christopher Gagne, Panagiotis Tzirakis, Alice Baird, Jakub Piotr Cłapa, Peter Chin, and Alan Cowen at Hume AI.

    Hume builds voice AI research infrastructure for frontier labs and AI-first enterprises. If you're working on voice models and need high-quality training data, evaluation systems, or reinforcement learning infrastructure, get in touch at

    [email protected]

    Original source Report a problem
  • Feb 27, 2026
    • Date parsed from source:
      Feb 27, 2026
    • First seen by Releasebot:
      Feb 28, 2026
    Hume logo

    Hume

    February 27, 2026

    EVI API additions

    • Added support for new supplemental LLM models:
      claude-opus-4-6, gpt-5.1, gpt-5.1-priority, gpt-5.2, gpt-5.2-priority.
    • Added support for zero prompt expansion.
      You can now set prompt_expansion to ZERO when configuring an external LLM, disabling automatic prompt expansion and giving you full control over the system prompt.

    TTS API bug fixes

    • Fixed a bug where duplicate interleaved audio was included in TTS audio output.
    • This resolves an issue where audio chunks could be duplicated and interleaved, resulting in distorted output.
    Original source Report a problem
  • All of your release notes in one feed

    Join Releasebot and get updates from Hume and hundreds of other software products.

  • December 2025
    • No date parsed from source.
    • First seen by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    EVI API improvements

    Added support for OpenAI’s GPT-5, GPT-5-mini, and GPT-5-nano models as a supplemental LLM options.

    Original source Report a problem
  • December 2025
    • No date parsed from source.
    • First seen by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    EVI API improvements

    New supplemental LLMs are now supported for all EVI versions:

    • Claude Sonnet 4 (Anthropic)
    • Llama 4 Maverick (SambaNova)
    • Qwen3 32B (SambaNova)
    • DeepSeek R1-Distill (Llama 3.3 70B Instruct, via SambaNova)
    • Kimi K2 (Groq)
    Original source Report a problem
  • Nov 14, 2025
    • Date parsed from source:
      Nov 14, 2025
    • First seen by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    November 14, 2025

    EVI API additions

    Added support for a new SESSION_SETTINGS chat event in the EVI chat history API. When you fetch chat events via /v0/evi/chats/:id, the response now includes entries that indicate when system settings were updated and which settings were applied.

    Original source Report a problem
  • Nov 7, 2025
    • Date parsed from source:
      Nov 7, 2025
    • First seen by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    November 7, 2025

    Voice conversion and secure EVI enhancements expand the platform with new endpoints to convert speech to a target voice, manage active chats over WebSocket, and handle tool calls via a webhook. This update signals practical, user-facing API improvements.

    TTS API additions

    Introduced voice conversion endpoints. Send speech, specify a voice, and receive audio converted to that target voice.

    • POST /v0/tts/voice_conversion/json: JSON response with audio and metadata.
    • POST /v0/tts/voice_conversion/file: Audio file response.

    EVI API additions

    Added a control plane API for EVI. Perform secure server-side actions and connect to active chats. See the Control Plane guide.

    • POST /v0/evi/chat/:chat_id/send: Send a message to an active chat.
    • WSS /v0/evi/chat/:chat_id/connect: Connect to an active chat over WebSocket.

    Added a tool call webhook event. Subscribe to tool calls to know when to invoke your tool, then send the tool response back to the chat using the control plane.

    Original source Report a problem
  • Oct 24, 2025
    • Date parsed from source:
      Oct 24, 2025
    • First seen by Releasebot:
      Feb 18, 2026
    Hume logo

    Hume

    AudioStack × Hume: Professional Audio for Creatives

    AudioStack expands its AI audio production suite with Hume’s expressive voices, boosting speed, consistency, and emotional depth for ads, podcasts, and branded content. The integration enables scalable, natural sounding output across markets and languages while reducing costs and raising creative quality.

    About AudioStack

    AudioStack is an enterprise AI audio production platform trusted by global creative teams at Publicis, Omnicom, iHeartMedia, Dentsu, and more. Their AI-driven production suite empowers agencies, publishers, AdTech platforms, and brands to create broadcast-ready audio content 10 times faster at a fraction of traditional costs—reducing production expenses by up to 80% while scaling effortlessly across markets and languages.

    Expanding Audiostack’s Voice Library with Hume

    AudioStack offers a comprehensive voice library for audio advertisements, podcasts, and branded content. As they continue to grow their voice offerings, they're integrating Hume's emotionally intelligent voices to meet two core demands of creative teams:

    1. Consistent Stability
      Enterprise content generation requires voices that perform reliably across thousands of productions. Hume's voices deliver consistent quality and pronunciation, ensuring brand messaging remains clear and professional, whether creating one ad or thousands of dynamic variations.

    2. Natural Expressiveness
      Generic TTS voices often sound flat or robotic—a dealbreaker for agencies creating audio that needs to engage audiences. Hume's voices bring genuine emotional depth, helping audio content feel authentic, engaging, and human.

    By adding Hume’s expressive voices to their platform, AudioStack enables creative teams and advertisers to produce high-quality, emotionally resonant audio at scale.

    For more information on how empathic AI can enhance your digital solutions, contact Hume AI.

    Original source Report a problem
  • Oct 21, 2025
    • Date parsed from source:
      Oct 21, 2025
    • First seen by Releasebot:
      Feb 18, 2026
    Hume logo

    Hume

    Creating immersive avatar experiences with Render Foundry

    Render Foundry unveils an immersive Babe Ruth simulator built with Hume AI voice cloning, blending Unreal Engine storytelling with authentic, warm dialogue. The experience lets visitors talk with Babe Ruth, syncing likeness with audio for a lifelike museum interaction.

    About Render Foundry

    Render Foundry specializes in creating immersive experiences, from interactive museum installations to digital twins of entire campuses. Led by Shane Boyce and Josh Harwell, their team combines Unreal Engine expertise with cutting-edge storytelling to blur the line between reality and simulation.

    When they set out to create an interactive Babe Ruth experience, they needed a voice that could do the impossible: bring him back to life with authenticity, warmth, and the personality that made him a legend.

    Watch the Experience

    Using Hume's custom voice cloning technology, Render Foundry created a Babe Ruth simulator that feels emotionally authentic. Hume captured the tonal qualities, cadence, and personality of the baseball icon, allowing visitors to have natural, engaging conversations with one of sports' most beloved figures.

    The result is an experience that transcends typical museum exhibits. Visitors not only learn about Babe Ruth, but also connect with him. Render Foundry created something truly special by simulating Babe’s likeness and syncing the audio and the visual, making this experience one-of-a-kind.

    Josh Harwell, the Creative Director at Render Foundry, says,

    “We’re excited to offer these curated experiences. It’s fun to watch clients interact with our characters as they are brought to life. Whether it’s a historical figure, a mascot, or a brand ambassador, Hume helps us deliver a solution that humanizes the responses of AI.”

    For more information on how empathic AI can enhance your digital solutions, contact Hume AI.

    Original source Report a problem
  • Oct 14, 2025
    • Date parsed from source:
      Oct 14, 2025
    • First seen by Releasebot:
      Feb 18, 2026
    Hume logo

    Hume

    Revelum × Hume: Detecting Voice Fraud in Real-Time

    Revelum launches an AI-native security platform that stops deepfake fraud in real time with call risk analysis, live deepfake detection, and precise timestamps. A strategic partnership with Hume AI speeds up resilient detection and responsible AI use, demonstrated by a real-time fraud scenario.

    Revelum

    Revelum is an AI-native security platform that protects institutions from deepfake impersonations and fraud in real time. Founded by Enrique Barco in 2025, Revelum provides turn-key solutions and developer-friendly APIs to safeguard institutions from emerging threats in the era of AI-driven fraud.

    Revelum’s strategic partnership with Hume AI ensures their detection systems can identify even the most advanced synthetic voices, including Hume's own Empathic Voice Interface (EVI).

    The Demo: AI-Powered Fraud in Action

    In a recent demo, Revelum showcased how attackers use AI voice agents to attempt account takeovers. The scenario: "Jake" calls customer support, claiming an AI assistant accidentally changed and deleted his password. The EVI-powered voice sounds natural, but Revelum's technology instantly flags the deepfake.

    In real-time, Revelum’s platform provides:

    • A call risk assessment analysis
    • Real-time deepfake detection counts
    • Precise timestamps of synthetic voice segments

    The customer service agent receives an alert and initiates callback verification—stopping the attack immediately.

    Hume’s Partnership with Revelum

    Our collaboration with Revelum creates a critical feedback loop for responsible AI development:

    • Early Access to EVI: Revelum trains their models on Hume's cutting-edge voice technology, ensuring detection capabilities stay ahead of emerging threats before they reach malicious actors.
    • Continuous Refinement: As Hume's emotionally intelligent voices become more sophisticated, Revelum's detection algorithms evolve in parallel.

    Revelum founder Enrique notes:

    "By partnering with Hume, we’re taking a vital step toward building technology that anticipates — not just reacts to — the evolving tactics of bad actors seeking to misuse powerful models. Together, we’re staying one step ahead in ensuring generative AI is used responsibly."

    For more information on how empathic AI can enhance your digital solutions, contact Hume AI.

    Original source Report a problem
  • Oct 3, 2025
    • Date parsed from source:
      Oct 3, 2025
    • First seen by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    October 3, 2025

    Octave 2 upgrades delivered across TTS and EVI: HTTP and WebSocket endpoints now support Octave 2 with timestamps at word and phoneme levels, plus EVI 4-mini enables multilingual TTS via Octave 2 with your chosen LLM.

    TTS API improvements

    Octave 2 is now available for use in TTS endpoints.

    For HTTP endpoints, specify "version": "2" in your request body. (reference)
    For the WebSocket endpoint, specify version=2 in the query parameters of the handshake request. (reference)

    Word and phoneme level timestamps are now supported for TTS APIs. See our Timestamps Guide to learn more.

    EVI API improvements

    EVI version 4-mini is now available. This version enables the use of Octave 2 for TTS alongside a supplemental LLM of your choosing, bringing Octave 2’s multilingual capabilities to EVI. Specify "version": "4-mini" in your EVI config version to use it.

    Original source Report a problem

Related vendors