Hume Release Notes

Last updated: Dec 23, 2025

  • December 2025
    • No date parsed from source.
    • Detected by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    EVI API improvements

    Added support for OpenAI’s GPT-5, GPT-5-mini, and GPT-5-nano models as a supplemental LLM options.

    Original source Report a problem
  • December 2025
    • No date parsed from source.
    • Detected by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    EVI API improvements

    New supplemental LLMs are now supported for all EVI versions:

    • Claude Sonnet 4 (Anthropic)
    • Llama 4 Maverick (SambaNova)
    • Qwen3 32B (SambaNova)
    • DeepSeek R1-Distill (Llama 3.3 70B Instruct, via SambaNova)
    • Kimi K2 (Groq)
    Original source Report a problem
  • Nov 14, 2025
    • Parsed from source:
      Nov 14, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    November 14, 2025

    EVI API additions

    Added support for a new SESSION_SETTINGS chat event in the EVI chat history API. When you fetch chat events via /v0/evi/chats/:id, the response now includes entries that indicate when system settings were updated and which settings were applied.

    Original source Report a problem
  • Nov 7, 2025
    • Parsed from source:
      Nov 7, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    November 7, 2025

    Voice conversion and secure EVI enhancements expand the platform with new endpoints to convert speech to a target voice, manage active chats over WebSocket, and handle tool calls via a webhook. This update signals practical, user-facing API improvements.

    TTS API additions

    Introduced voice conversion endpoints. Send speech, specify a voice, and receive audio converted to that target voice.

    • POST /v0/tts/voice_conversion/json: JSON response with audio and metadata.
    • POST /v0/tts/voice_conversion/file: Audio file response.

    EVI API additions

    Added a control plane API for EVI. Perform secure server-side actions and connect to active chats. See the Control Plane guide.

    • POST /v0/evi/chat/:chat_id/send: Send a message to an active chat.
    • WSS /v0/evi/chat/:chat_id/connect: Connect to an active chat over WebSocket.

    Added a tool call webhook event. Subscribe to tool calls to know when to invoke your tool, then send the tool response back to the chat using the control plane.

    Original source Report a problem
  • Oct 3, 2025
    • Parsed from source:
      Oct 3, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    October 3, 2025

    Octave 2 upgrades delivered across TTS and EVI: HTTP and WebSocket endpoints now support Octave 2 with timestamps at word and phoneme levels, plus EVI 4-mini enables multilingual TTS via Octave 2 with your chosen LLM.

    TTS API improvements

    Octave 2 is now available for use in TTS endpoints.

    For HTTP endpoints, specify "version": "2" in your request body. (reference)
    For the WebSocket endpoint, specify version=2 in the query parameters of the handshake request. (reference)

    Word and phoneme level timestamps are now supported for TTS APIs. See our Timestamps Guide to learn more.

    EVI API improvements

    EVI version 4-mini is now available. This version enables the use of Octave 2 for TTS alongside a supplemental LLM of your choosing, bringing Octave 2’s multilingual capabilities to EVI. Specify "version": "4-mini" in your EVI config version to use it.

    Original source Report a problem
  • Oct 1, 2025
    • Parsed from source:
      Oct 1, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    Octave 2: next-generation multilingual voice AI

    Octave 2 debuts as a hyperreal voice AI with 11 languages, sub 200ms latency, and new voice conversion plus direct phoneme editing. Faster, cheaper, and available now on our platform and API for early access.

    Octave 2

    • More deeply understands the emotional tone of speech.
    • Extends our text-to-speech system to 11 languages.
    • Is 40% faster and more efficient, generating audio in under 200ms.
    • Offers new first-of-their-kind features for a speech-language model, including voice conversion and direct phoneme editing.
    • Pronounces uncommon words, repeated words, numbers, and symbols more reliably.
    • Is half the price of Octave 1.

    A speech-language model is a state-of-the-art AI model trained to understand and synthesize both language and speech. Unlike traditional TTS models, it understands how the script informs the tune, rhythm, and timbre of acting, inferring when to whisper secrets, shout triumphantly, or calmly explain a fact. Understanding these aspects of speech also allows it to reproduce the personality, and not just vocal timbre, of any speaker.

    With Octave 2, we’ve taken these capabilities a step further.

    Hyperrealistic voice AI in 11 languages

    Octave 2 extends our next-generation voice AI to 11 languages: Arabic, English, French, German, Hindi, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.

    Example Japanese-language generation from Octave 2:
    「ハハハ、えっと、これがフォーマルなイベントだとは知らなかった。だから、えっと、バットマンのパジャマを着ているんだ。あなたの家ではカジュアルなイベントだと思っていたんだけど。すごく着飾っていない気がする。」

    Example Korean-language generation from Octave 2:
    "내가 이걸 얼마나 오랫동안 했는지 알아? 그런데 네가 그냥. 삭제했어? 내 인생의 3개월이 순식간에 사라졌어. 네가 다시 확인도 안 해줬으니까!"

    Both of these voices were created with instant cloning, each using a 15-second audio recording of a native speaker's voice. When used to generate speech in a different language, Octave 2 predicts the speaker's accent. For instance, this is an English-language sample generated using the Japanese voice.

    Octave 2 is becoming proficient in other languages, too; we will be announcing support for at least 20 languages in the coming months.

    High quality at low latency

    Octave 2 is the fastest and most efficient model of its kind, returning responses in under 200ms.

    This was achieved without trading quality for latency. Instead, we deployed Octave 2 on some of the world’s most advanced chips for LLM inference. Working closely with Sambanova, we developed a new inference stack specific to Octave 2’s new speech-language model architecture.

    Octave 2 isn’t just fast, it’s also efficient. We’re offering it at half the price of Octave 1. With dedicated deployments, this can be reduced to under a cent per minute of audio. This efficiency allows Octave 2 to power large-scale applications in entertainment, gaming, customer service, and more.

    Voice conversion

    With Octave 2, we’ve been working on two novel capabilities for a speech-language model: realistic voice conversion and direct phoneme editing.

    With voice conversion, Octave 2 can exchange one voice for another while freezing the phonetic qualities and timing of the spoken utterance. This is ideal for use cases that require actors to stand in for other actors, such as dubbing in a new language with the original actor’s voice, or making precise human touch-ups to AI voiceovers.

    For instance, when we prompt the model with the following speech:

    And the following target voice:

    The model generates the following converted speech:

    Phoneme Editing

    We're also exploring a new phoneme editing capability, where minute adjustments can be made to the timing and pronunciation of speech. This enables support for custom pronunciation of names, manipulation of word emphasis, and more.

    For instance, take this classic film quote, recreated with voice conversion:

    Using phoneme editing, we can alter the pronunciation of words in the original quote. Here's an example:

    We've created a new word, "leviaso," out of the phonemes present in the original quote. This kind of granular phoneme replication and editing would have proven difficult, if not impossible, with text input alone.

    Voice conversion and phoneme editing will be available soon on our platform.

    Build conversational experiences with EVI 4 mini

    Finally, we're launching EVI 4 mini, which brings all of the capabilities of Octave 2 to our speech-to-speech API. Now, you can build faster, smoother interactive experiences in 11 languages. For example, we built a translator app using EVI 4 mini with just a few voice samples and a prompt.

    EVI 4 mini doesn't yet generate its own language natively, so you'll need to pair it with an external LLM through our API until we launch the full version.

    Access Octave 2 and EVI 4 mini

    Today, we're rolling out access to Octave 2 on our text-to-speech playground and API, and to EVI 4 mini on our speech-to-speech playground and API.

    Soon we'll be releasing more evaluations, more languages, and access to voice conversion and phoneme editing.

    In the meantime, we're excited to see what you create!

    Original source Report a problem
  • Sep 24, 2025
    • Parsed from source:
      Sep 24, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    Hume AI powers conversational learning with Coconote

    Coconote adds voice chat powered by Hume EVI to turn notes into interactive conversations with natural questions, explanations, and quiz prompts. The integration delivers seamless dialogue, contextual note referencing, and emotionally aware responses to boost study engagement.

    About Coconote

    Coconote is a leading AI note-taker, consistently ranking in the Top 50 Education apps on Apple's App Store. They help transform lectures and meetings into organized notes, transcripts, and study materials, supporting over 100 languages and multiple content formats including audio, video, and PDF conversion.

    Coconote is tailor-made for neuro-diverse students and professionals including those with ADHD, ASD, dyslexia, and auditory processing issues.

    Enhancing Student Engagement

    While traditional note-taking apps require students to manually scroll and search through content, Coconote is creating interactive study experiences through conversational AI.

    Coconote’s voice chat feature, powered by Hume's EVI, helps users transform static notes into dynamic conversations. Students can:

    • Ask natural questions about their lecture content,
    • Receive contextual explanations referencing specific notes, and
    • Engage in quiz-style conversations for active learning—all through natural voice interaction.

    EVI powers the "Voice Chat" feature on Coconote

    For example, a student might ask, "What were the key differences between mitosis and meiosis from today's biology lecture?" and receive an immediate response from EVI that not only references their notes, but also discusses follow-up questions and tangential topics (e.g., cytokinesis).

    Technical Integration

    EVI's advanced capabilities make it an ideal fit for Coconote. The platform delivers conversational latency for natural dialogue flow, and uses intelligent end-of-turn detection based on vocal cues. Most importantly, EVI provides emotionally aware responses that adapt to students’ attitudes and confidence levels.

    From a development perspective, the integration was seamless:

    • Clean API integration with existing Coconote infrastructure
    • Custom system prompts aligned with Coconote's supportive brand voice
    • Chat history capabilities for reviewing past study sessions

    The result enables students to have meaningful conversations with their notes while maintaining the technical sophistication needed for educational applications. Users can test the EVI integration in Coconote directly at https://coconote.app/.

    For more information on how empathic AI can enhance your digital solutions, please contact Hume AI.

    Original source Report a problem
  • Sep 12, 2025
    • Parsed from source:
      Sep 12, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    September 12, 2025

    TTS API improvements

    Responses from /v0/tts/stream/json now include a request_id field for easier tracking and debugging.

    EVI API improvements

    You can now change the voice within an active session by specifying a voice_id in a session settings message.

    Original source Report a problem
  • Sep 5, 2025
    • Parsed from source:
      Sep 5, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    September 5, 2025

    TTS API improvements

    Introduced a TTS WebSocket endpoint that streams text in and speech out.

    Original source Report a problem
  • Aug 22, 2025
    • Parsed from source:
      Aug 22, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Hume logo

    Hume

    August 22, 2025

    EVI API improvements

    Fixed a bug where the voice_id query parameter on the /chat endpoint only accepted Voice Library voices. It now supports custom voices as well.

    Original source Report a problem

Related vendors