Mistral Release Notes

Last updated: Apr 1, 2026

Mistral Products

All Mistral Release Notes (77)

  • Apr 1, 2026
    • Date parsed from source:
      Apr 1, 2026
    • First seen by Releasebot:
      Apr 1, 2026
    Mistral logo

    Mistral Common by Mistral

    v1.11.0

    Mistral Common adds tag v1.11.0 for its public PyPI release.

    Adds tag v1.11.0 for public PyPI release

    Original source Report a problem
  • March 2026
    • No date parsed from source.
    • First seen by Releasebot:
      Mar 27, 2026
    Mistral logo

    Mistral

    Speaking of Voxtral

    Mistral releases Voxtral TTS, its first multilingual text-to-speech model for natural, emotionally expressive voice generation. It supports 9 languages, low-latency streaming, custom voices, and testing in Mistral Studio, with API access now available.

    Today we’re releasing Voxtral TTS, our first text-to-speech model with state-of-the-art performance in multilingual voice generation. The model is lightweight at 4B parameters, making Voxtral-powered agents natural, reliable, and cost-effective at scale.

    Highlights

    • Realistic, emotionally expressive speech in 9 popular languages with support for diverse dialects.
    • Very low latency for time-to-first-audio.
    • Easily adaptable to new voices.
    • Available to test out in Mistral Studio.
    • Enterprise-grade text-to-speech, powering critical voice agent workflows.

    A natural voice generation hinges on the model’s ability to not only recite but interpret a text accurately. Contextual understanding - like neutral, happy, sarcastic, etc. - determines whether the listener considers the generation accurate or robotic. Our model excels at both contextual understanding and speaker modeling: capturing how a specific person naturally speaks. Our voice adaptation goes beyond traditional read-speech by capturing a speaker’s personality, including their natural pauses, rhythm, intonation, and emotional dexterity. With its compact size, low cost and latency, and easy adaptability, Voxtral TTS gives full control and customization for enterprises looking to own their voice AI stack.

    Audio is the new UX. Create new interactions for collaboration and understanding only found in speech. Begin now in AI Studio with our Mistral Voices in American, British, and French dialects.

    Listen and decide: can you tell the difference?

    Our team speaks dozens of languages in multiple dialects, we understand the importance of cultural nuance and built a model that is a reflection of us. Speech generation builds trust via natural-like rhythm, emotion, and even the use of humor. That’s why with voice emulation, we focused on authenticity and emotional expressiveness.

    State-of-the-art performance

    Automated metrics such as word-error-rate and audio quality scores for multilingual text-to-speech systems are unable to measure naturalness of speech. What makes speech natural is extremely nuanced and requires a deep understanding of cultural differences and typical speaking patterns. Hence, comparative human evaluations performed by native speakers are crucial.

    For voice agents, latency and quality are in constant tension. Human evaluations show that Voxtral TTS achieves superior naturalness compared to ElevenLabs Flash v2.5 while maintaining similar Time-to-First-Audio (TTFA). Voxtral also performs at parity with the quality of ElevenLabs v3, successfully supporting emotion-steering for more lifelike interactions.

    We conducted a comparative human evaluation of Voxtral TTS and ElevenLabs v2.5 Flash in a zero-shot custom voice context. Using two recognizable voices in their native dialects for each of the 9 supported languages, 3 annotators performed a side-by-side preference test per pair on naturalness, accent adherence, and acoustic similarity to the original reference. Voxtral TTS widens the quality gap to v2.5 Flash in this zero-shot multilingual custom voice setting, highlighting the instant customizability of Voxtral TTS to any voice.

    Spoken natively

    Trained on a large speech dataset, Voxtral TTS is built for global application. It supports state-of-the-art performance in 9 languages: English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic.

    The model was trained to adapt to a custom voice with a reference as little as 3s and capture not just the voice but also nuances like subtle accent, inflections, intonations and even disfluencies similar to those expressed in the reference. We offer some preset voice options in the API but it is simple to extend to your in-house voice library customizing it to the use-case, localize it to the language and accent, keep it neutral or more emotive, casual or formal, more natural and conversational or robotic.

    The model also demonstrates zero-shot cross-lingual voice adaptation even though it’s not explicitly trained for it. For example, the model can generate English speech with a French voice prompt and English text. The resulting speech sounds natural while adopting the accent of the provided voice prompt (in this example, the generated speech has a natural French-accented English). This makes the model useful for building cascaded speech-to-speech translation systems.

    Built for low-latency streaming

    Latency is critical for voice agent applications. Voxtral TTS achieves a model latency of 70ms for a typical input voice sample of 10 seconds and 500 characters, with a real-time factor (RTF) of ≈9.7x. The model natively generates up to two minutes of audio, and our API handles arbitrarily long generations with smart interleaving.

    Voxtral TTS architecture

    The model is a transformer-based, autoregressive, flow-matching model, built on Ministral 3B. It consists of the following components:

    • 3.4B parameters transformer decoder backbone
    • 390M flow-matching acoustic transformer
    • 300M neural audio codec (symmetric encoder-decoder)

    The model takes a voice prompt (5 to 25 seconds) and a text prompt in 9 supported languages. For each audio frame, the transformer backbone predicts a semantic token, then the flow-matching transformer runs 16 function evaluations (NFEs) to produce the acoustic latent.

    We developed an in-house codec, which processes audio causally using a semantic VQ (8192 vocabulary) and an acoustic FSQ (36 dim and 21 levels) latent and produces them at 12.5Hz frame rate.

    Powering enterprise voice workflows

    Voxtral TTS closes the loop on audio intelligence, giving enterprise voice pipelines an output layer that passes the human test. It works alongside Voxtral Transcribe for full speech-to-speech, or integrates into any existing speech-to-text and LLM stack, with cross-lingual support.

    Workflows

    Customer Support

    Voice agents that route and resolve queries across channels with natural, brand-appropriate speech. Place Voxtral TTS into existing contact support call systems for automated spoken responses, with output that integrates into existing workflows.

    Test-run the model in Mistral Studio

    Experiment with Voxtral TTS directly in the Mistral Studio playground. Select one of the Mistral voices or record your own.

    Get started with Voxtral TTS

    Voxtral TTS is available now via API at $0.016 per 1k characters.
    Try it now in Mistral Studio or in Le Chat.
    A model with several reference voices is available as open weights on Hugging Face under CC BY NC 4.0 license.
    Explore the model’s documentation.
    Sign up for our upcoming webinar to learn more!

    We’re hiring!

    We are building the voice layer for AI, and If this is the kind of problem you want to work on, we'd love to hear from you.

    Original source Report a problem
  • All of your release notes in one feed

    Join Releasebot and get updates from Mistral and hundreds of other software products.

  • Mar 22, 2026
    • Date parsed from source:
      Mar 22, 2026
    • First seen by Releasebot:
      Mar 27, 2026
    Mistral logo

    Mistral

    March 22

    Mistral releases Voxtral TTS with zero-shot voice cloning, multilingual support, and real-time streaming.

    We released Voxtral TTS (voxtral-tts-2603), our state-of-the-art text-to-speech model with zero-shot voice cloning, multilingual support, and real-time streaming.

    MODEL RELEASED

    Original source Report a problem
  • March 2026
    • No date parsed from source.
    • First seen by Releasebot:
      Mar 17, 2026
    Mistral logo

    Mistral

    Introducing MistralSmall 4

    Mistral releases Small 4, a unified model that combines fast instructing, deep reasoning, and multimodal chat in one versatile engine. With 119B params, 256k context, and configurable reasoning, it delivers open-source availability and strong efficiency gains across inference and throughput.

    Introducing Mistral Small 4

    Mistral Small 4

    A fast instruct model

    A powerful reasoning engineer

    A multimodal assistant

    Today, we are announcing Mistral Small 4. This model is the next major release in the Mistral Small family. Mistral Small 4 is the first Mistral model to unify the capabilities of our flagship models, Magistral for reasoning, Pixtral for multimodal, and Devstral for agentic coding, into a single, versatile model. With Small 4, users no longer need to choose between a fast instruct model, a powerful reasoning engine, or a multimodal assistant: one model now delivers all three, with configurable reasoning effort and best-in-class efficiency.

    Mistral Small 4 is released under the Apache 2.0 license, continuing our commitment to open, accessible, and customizable AI.

    A new standard for multimodal, reasoning-optimized models

    Mistral Small 4 is a hybrid model optimized for general chat, coding, agentic tasks, and complex reasoning. Its architecture supports both text and image inputs, making it versatile for a wide range of applications. With Mistral Small 4, we reaffirm our commitment to open-source models and are proud to join the
    NVIDIA Nemotron Coalition
    as a founding member, advancing collaboration and innovation in AI development.

    Key architectural details

    • Mixture of Experts (MoE): 128 experts, with 4 active per token, enabling efficient scaling and specialization.
    • 119B total parameters, with 6B active parameters per token (8B including embedding and output layers).
    • 256k context window, supporting long-form interactions and document analysis.
    • Configurable reasoning effort: Toggle between fast, low-latency responses and deep, reasoning-intensive outputs.
    • Native multimodality: Accepts both text and image inputs, unlocking use cases from document parsing to visual analysis.

    Performance highlights

    • 40% reduction in end-to-end completion time (latency-optimized setup).
    • 3x more requests per second (throughput-optimized setup) compared to Mistral Small 3.

    Why Mistral Small 4?

    Unified capabilities

    Mistral Small 4 consolidates the strengths of Magistral (reasoning), Devstral (coding agents), and Mistral Small (instruct) into a single model. Whether you need a chat assistant, a research partner, or a coding agent, Small 4 adapts to your task, no need to switch between specialized models.

    Reasoning on demand

    With the new
    reasoning_effort
    parameter, users can dynamically adjust the model’s behavior:

    • reasoning_effort="none": Fast, lightweight responses for everyday tasks, equivalent to the same chat style of Mistral Small 3.2.
    • reasoning_effort="high": Deep, step-by-step reasoning for complex problems, with equivalent verbosity to previous Magistral models.
    Enterprise-grade efficiency
    • Minimum infrastructure: 4x NVIDIA HGX H100, 2x NVIDIA HGX H200, or 1x NVIDIA DGX B200.
    • Recommended setup: 4x NVIDIA HGX H100, 4x NVIDIA HGX H200, or 2x NVIDIA DGX B200 for optimal performance.
    • Mistral Small 4 is fully open source. Fine-tune it for specialized tasks or deploy it out of the box for general-purpose use. Thanks to collaboration with the community, it’s now available on vLLM, llama.cpp, SGLang, Transformers, and more.
    • Delivering advanced open-source AI models requires broad optimization. Through close collaboration with NVIDIA, inference has been optimized for both open source vLLM and SGLang, ensuring efficient, high-throughput serving across deployment scenarios.

    Figure: Score vs. Output Length across three benchmarks. Top: accuracy scores (higher is better). Bottom: average output length in thousands of characters (shorter is better).

    Mistral Small 4 with reasoning achieves competitive scores, matching or surpassing GPT-OSS 120B on all three benchmarks, while generating significantly shorter outputs. On AA LCR, Mistral Small 4 scores 0.72 with just 1.6K characters, whereas Qwen models need 3.5-4x more output (5.8-6.1K) for comparable performance. On LiveCodeBench, Mistral Small 4 outperforms GPT-OSS 120B while producing 20% less output. This efficiency gap matters in practice: shorter outputs mean lower latency, reduced inference costs, and a better user experience.

    For enterprise buyers:

    • Efficiency per token directly impacts cost and scalability. Models that maintain or improve performance as responses grow longer reduce the need for manual intervention, lower operational costs, and ensure consistent quality, even for complex, high-stakes tasks like report generation, customer support, or decision-making workflows. Hybrid reasoning models deliver better value by maximizing accuracy without proportional increases in resource use, making them ideal for large-scale deployments where both performance and cost-efficiency are critical.

    For technical teams and data scientists:

    • Performance per token is a key metric for model selection and optimization. Models that scale efficiently allow teams to deploy solutions for longer, more nuanced tasks (e.g., detailed analytics, multi-step reasoning) without sacrificing accuracy or inflating computational costs. This means fewer trade-offs between quality and resource allocation, enabling more innovative and reliable AI-driven applications. It also simplifies fine-tuning and integration, as the model’s robustness reduces the need for constant adjustments or fallback systems.

    Intended use cases

    Mistral Small 4 is designed for:

    • Developers: Coding automation, codebase exploration, and code agentic workflows.
    • Enterprises: General chat assistants, document understanding, and multimodal analysis.
    • Researchers: Math, research, and complex reasoning tasks.

    Its open-source license and customizable architecture make it ideal for fine-tuning and specialization.

    Availability

    • Mistral API and
      AI Studio

    • Hugging Face Repository

    • Developers can prototype Mistral Small 4 for free on NVIDIA accelerated computing at
      build.nvidia.com
      , and for production deployment, Mistral Small 4 is available day-0 as an NVIDIA NIM, delivering optimized, containerized inference out of the box. It can also be customized with
      NVIDIA NeMo
      for domain-specific fine-tuning.

    • Technical documentation for customers is available on our
      AI Governance Hub

    For enterprise deployments, custom fine-tuning, or on-premises solutions,
    contact our team
    .

    The future of AI is open

    By unifying instruct, reasoning, and multimodal capabilities, Mistral Small 4 simplifies AI integration and empowers users to tackle a wider range of tasks with a single, adaptable tool, bringing the benefits of open source AI to real-world use cases.

    Original source Report a problem
  • Mar 16, 2026
    • Date parsed from source:
      Mar 16, 2026
    • First seen by Releasebot:
      Mar 17, 2026
    Mistral logo

    Mistral

    Leanstral: Open-Source foundation for trustworthy vibe-coding

    Mistral releases Leanstral, the first open-source Lean 4 code agent designed for formal verification with a 6B parameter core. It ships Apache 2.0 licensed weights, zero-setup in Mistral Vibe, a free Labs API, MCP support, and competitive benchmarks against OSS models.

    First open-source code agent for Lean 4.

    AI agents have proven to be highly capable tools at code generation. Yet, as we push these models to high-stakes domains, ranging from frontier research mathematics to mission-critical software, we encounter a scaling bottleneck: the human review. The time and specialized expertise required to manually verify become the primary impedance of engineering velocity.

    We envision a more helpful generation of coding agents to both carry out their tasks and formally prove their implementations against strict specifications. Instead of debugging machine-generated logic, humans dictate what they want. Today, we are taking the first major step toward that vision.

    Introducing Leanstral

    We release Leanstral, the first open-source code agent designed for Lean 4. Lean4 is a proof assistant capable of expressing complex mathematical objects such as
    perfectoid spaces
    and software specifications like
    properties of Rust fragments
    . Unlike existing proving systems that act as wrappers around large generalist models or focus on single math problems, Leanstral is designed to be highly efficient (with 6B active parameters) and trained for operating in realistic formal repositories.

    • Open and accessible: We release Leanstral weights under an Apache 2.0 license, in an agent mode within Mistral vibe, and through a free API endpoint. We will also release a tech report detailing our training approach, and a new evaluation suite FLTEval, to move evaluations beyond their focus on competition math.

    • Efficient and mighty: We use a highly sparse architecture for Leanstral, and optimise it for proof engineering tasks. Leveraging parallel inference with Lean as a perfect verifier, Leanstral is both performant and cost-efficient against existing closed-source competitors.

    • Upgradable via MCP: Leanstral supports arbitrary MCPs through vibe, and was specifically trained to achieve maximal performance with the frequently used lean-lsp-mcp.

    Evaluation

    To reflect usefulness in realistic proof engineering scenarios, we benchmark Leanstral for completing all formal proofs and correctly defining new mathematical concepts in each PR to the FLT project, instead of isolated mathematical problems. We compare Leanstral against leading coding agents (Claude Opus 4.6, Sonnet 4.6, Haiku 4.5) and open-source models (Qwen3.5 397B-A17B, Kimi-K2.5 1T-A32B, GLM5 744B-A40B).

    Leanstral vs. OSS Models

    Leanstral-120B-A6B demonstrates a significant efficiency advantage over its much larger open-source peers. While models like GLM5-744B-A40B and Kimi-K2.5-1T-32B struggle to scale, capping their FLTEval scores at approximately 16.6 and 20.1 respectively, Leanstral outperforms them both with just a single pass.

    Even Qwen3.5-397B-A17B, the strongest OSS competitor shown, requires 4 passes to reach a score of 25.4. In contrast, Leanstral achieves a superior score of 26.3 with half that investment (pass@2) and continues to scale linearly, reaching 29.3 at the same cost level.

    Leanstrall Normalized Model Cost Vs Flt Eval Score

    Leanstral vs. Claude Family

    Leanstral serves as a high-value alternative to the Claude suite, offering competitive performance at a fraction of the price: Leanstral pass@2 reaches a score of 26.3, beating Sonnet by 2.6 points, while costing only $36 to run, compared to Sonnet’s $549. At pass@16, Leanstral reaches a score of 31.9, comfortably beating Sonnet by 8 points. While Claude Opus 4.6 remains the leader in quality, it carries a staggering cost of $1,650, 92 times higher than running Leanstral.

    In our benchmarking, we used Mistral Vibe as the scaffold with no modifications specifically for the evaluation.

    Model | Cost ($) | Score
    Haiku | 184 | 23.0
    Sonnet | 549 | 23.7
    Opus | 1,650 | 39.6
    Leanstral | 18 | 21.9
    Leanstral pass@2 | 36 | 26.3
    Leanstral pass@4 | 72 | 29.3
    Leanstral pass@8 | 145 | 31.0
    Leanstral pass@16 | 290 | 31.9

    Case studies

    Answering stackexchange posts about changes in newest Lean version

    When breaking changes hit a new Lean release, migrating code can be a massive headache. We fed Leanstral
    a real-world question from the Proof Assistants Stack Exchange
    about a script that mysteriously stopped compiling in Lean 4.29.0-rc6 (which we did not train with due to its recency). The culprit was a rewrite (
    rw
    ) tactic that suddenly failed to match patterns involving a simple type alias, initially written as
    def T2 := List Bool
    .

    Instead of taking a stab in the dark, Leanstral rolled up its sleeves. It successfully built test code to recreate the failing environment and diagnosed the underlying issue with definitional equality. The model correctly identified that because def creates a rigid definition requiring explicit unfolding, it was actively blocking the rw tactic from seeing the underlying structure it needed to match.

    The fix it proposed was simple: just swap
    def
    for
    abbrev
    . Because
    abbrev
    creates a transparent alias that is immediately definitionally equal to the original type, the
    rw
    tactic could once again perfectly match the pattern
    (L2 n).length
    in the proof. Leanstral completes the job and explains the rationale to the user perfectly.

    Reasoning about programs

    We copied over definitions in Rocq from
    https://www.cs.princeton.edu/courses/archive/fall10/cos441/sf/Imp.html
    and asked Leanstral to convert to Lean. It did so successfully, even implementing custom notation. Example snippet:

    inductive ceval : com → state → state → Prop where
    | E_Skip (st : state) : ceval .CSkip st st
    | E_Ass (st : state) (a1 : aexp) (n : Nat) (l : ident) (h : aeval a1 st = n) : ceval (.CAss l a1) st (update st l n)
    | E_Seq (c1 c2 : com) (st st' st'' : state) (h1 : ceval c1 st st') (h2 : ceval c2 st' st'') : ceval (.CSeq c1 c2) st st''
    | E_IfTrue (st st' : state) (b1 : bexp) (c1 c2 : com) (h : beval b1 st = true) (h1 : ceval c1 st st') : ceval (.CIf b1 c1 c2) st st'
    | E_IfFalse (st st' : state) (b1 : bexp) (c1 c2 : com) (h : beval b1 st = false) (h1 : ceval c2 st st') : ceval (.CIf b1 c1 c2) st st'
    | E_WhileEnd (b1 : bexp) (st : state) (c1 : com) (h : beval b1 st = false) : ceval (.CWhile b1 c1) st st
    | E_WhileLoop (st st' st'' : state) (b1 : bexp) (c1 : com) (h1 : beval b1 st = true) (h2 : ceval c1 st st') (h3 : ceval (.CWhile b1 c1) st' st'') : ceval (.CWhile b1 c1) st st''

    -- Notation for command evaluation
    notation: 50 c " / " st " ⇒ " st' => ceval c st st'

    It could also translate to Lean and then prove some properties about programs in this language when just given the Rocq statement (without proof):

    -- Example command: adds 2 to variable X
    def plus2 : com := .CAss "X" (.APlus (.AId "X") (.ANum 2))

    -- Theorem: The plus2 command correctly adds 2 to variable X
    -- Intuition: If X has value n in the initial state, after executing plus2, -- X will have value n +2 in the final state
    -- This specifies the behavior of the plus2 command
    theorem plus2_spec (st : state) (n : Nat) (st' : state) (h1 : st "X" = n) (h2 : plus2 / st ⇒ st') : st' "X" = n + 2 := by
    -- plus2 is defined as .CAss "X" (.APlus (.AId "X") (.ANum 2))
    -- Use equation compiler to unfold it
    change ceval (.CAss "X" (.APlus (.AId "X") (.ANum 2))) st st' at h
    cases h with | E_Ass _ _ n l h => have : aeval (.APlus (.AId "X") (.ANum 2)) st = n := h
    simp only [aeval] at this
    rw [update]
    simp [← this, h1]

    Demand Proof. Try Leanstral Today.

    Leanstral is available today for everyone to use.

    • Zero-Setup in
      Mistral Vibe
      : We’ve integrated Leanstral directly into Mistral Vibe for immediate, zero-setup vibe coding and proving. Use
      /leanstall
      to start.

    • Labs API: Access the model via our free/near-free API endpoint
      labs-leanstral-2603
      . We are keeping this endpoint highly accessible for a limited period to gather realistic feedback and observability data to fuel the next generation of verified code models.

    • Own the Weights: Download the Apache 2.0 licensed model and run it on your own metal.

    Documentation - Sign Up for Mistral Vibe

    Original source Report a problem
  • Mar 15, 2026
    • Date parsed from source:
      Mar 15, 2026
    • First seen by Releasebot:
      Mar 17, 2026
    • Modified by Releasebot:
      Mar 18, 2026
    Mistral logo

    Mistral

    March 15

    Mistral releases Mistral Small 4, a multimodal hybrid model with 256k context, and Leanstral, an open-source code agent for Lean 4.

    MODEL RELEASED

    • We released Mistral Small 4 (mistral-small-2603), a hybrid model unifying instruct, reasoning, and coding in a single multimodal model with a 256k context window.

    • We released Leanstral (labs-leanstral-2603), our first open-source code agent designed for Lean 4 formal proof engineering.

    Original source Report a problem
  • Mar 15, 2026
    • Date parsed from source:
      Mar 15, 2026
    • First seen by Releasebot:
      Mar 17, 2026
    Mistral logo

    Mistral

    March 15

    Mistral releases Leanstral, an open-source code agent for Lean 4 formal proof engineering.

    We released Leanstral (labs-leanstral-2603), our first open-source code agent designed for Lean 4 formal proof engineering.

    MODEL RELEASED

    Original source Report a problem
  • Mar 13, 2026
    • Date parsed from source:
      Mar 13, 2026
    • First seen by Releasebot:
      Mar 13, 2026
    Mistral logo

    Mistral Common by Mistral

    v1.10.0: Tokenizer v15, Reasoning Effort and Python 3.14

    Mistral unveils Version 1.10.0 with new capabilities and improvements such as Python 3.14 support, speech request addition, and strict function calling, plus v15. Tests now use mocked HTTP responses and several contributors are noted. Full changelog covers v1.9.1 to v1.10.0.

    What's Changed

    • Allow System Prompt with Audio for v13 by @juliendenize in #184
    • test_audio: Replace live network calls in test_from_url with mocked HTTP responses by @framsouza in #188
    • fix: typo in serve command help text by @framsouza in #189
    • Add Python 3.14 support by @juliendenize in #195
    • test: mock remaining network call in test_encode_invalid_audio_url_chunk by @abdelhadi703 in #192
    • [Speech Request] Add speech request by @patrickvonplaten in #196
    • Add strict function calling support by @juliendenize in #197
    • Add v15 by @juliendenize in #199
    • Version 1.10.0 by @juliendenize in #200

    New Contributors

    • @framsouza made their first contribution in #188
    • @abdelhadi703 made their first contribution in #192

    Full Changelog: v1.9.1...v1.10.0

    Original source Report a problem
  • Mar 11, 2026
    • Date parsed from source:
      Mar 11, 2026
    • First seen by Releasebot:
      Mar 18, 2026
    Mistral logo

    Mistral

    March 11

    Mistral releases Moderation 2603 with Custom Guardrails for Agents and Conversations and per-request guardrails on conversations and chat completions.

    MODEL RELEASED

    We released Mistral Moderation 2603 (mistral-moderation-2603).

    We added
    Custom Guardrails
    support for Agents and Conversations.

    API UPDATED

    Guardrails can now be configured directly on an Agent via the
    guardrails
    parameter.

    Guardrails can be passed per-request on
    POST /v1/conversations
    using the
    guardrails
    field.

    Guardrails can be passed per-request on
    POST /v1/chat/completions
    using the
    guardrails
    field.

    Original source Report a problem
  • February 2026
    • No date parsed from source.
    • First seen by Releasebot:
      Feb 25, 2026
    Mistral logo

    Mistral

    Voxtral transcribes at the speed of sound.

    Voxtral launches Transcribe 2 with Mini Transcribe V2 and Realtime, plus an audio playground in Mistral Studio. It offers diarization, multilingual support, low latency sub-200ms, open weights, and edge-ready deployment.

    Highlights

    • Voxtral Mini Transcribe V2: State-of-the-art transcription with speaker diarization, context biasing, and word-level timestamps in 13 languages.
    • Voxtral Realtime: Purpose-built for live transcription with latency configurable down to sub-200ms, enabling voice agents and real-time applications.
    • Best-in-class efficiency: Industry-leading accuracy at a fraction of the cost, with Voxtral Mini Transcribe V2 achieving the lowest word error rate, at the lowest price point.
    • Open weights: Voxtral Realtime ships under Apache 2.0, deployable on edge for privacy-first applications.

    Voxtral Realtime

    Voxtral Realtime is purpose-built for applications where latency matters. Unlike approaches that adapt offline models by processing audio in chunks, Realtime uses a novel streaming architecture that transcribes audio as it arrives. The model delivers transcriptions with delay configurable down to sub-200ms, unlocking a new class of voice-first applications.

    Word error rate (lower is better) across languages in the FLEURS transcription benchmark.

    At 2.4 seconds delay, ideal for subtitling, Realtime matches Voxtral Mini Transcribe V2, our latest batch model. At 480ms delay, it stays within 1-2% word error rate, enabling voice agents with near-offline accuracy.

    The model is natively multilingual, achieving strong transcription performance in 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. With a 4B parameter footprint, it runs efficiently on edge devices, ensuring privacy and security for sensitive deployments.

    We’re releasing the model weights under Apache 2.0 on the Hugging Face Hub.

    Voxtral Mini Transcribe V2

    Voxtral Mini Transcribe V2 delivers significant improvements in transcription and diarization quality across languages and domains. At approximately 4% word error rate on FLEURS and $0.003/min, Voxtral offers the best price-performance of any transcription API. It outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, Assembly Universal, and Deepgram Nova on accuracy, and processes audio approximately 3x faster than ElevenLabs’ Scribe v2 while matching on quality at one-fifth the cost.

    Model features

    Voxtral Mini Transcribe 2 introduces key capabilities.

    • Speaker diarization.
      Generate transcriptions with speaker labels and precise start/end times. Ideal for meeting transcription, interview analysis, and multi-party call processing. Note: with overlapping speech, the model typically transcribes one speaker.
    • Context biasing.
      Provide up to 100 words or phrases to guide the model toward correct spellings of names, technical terms, or domain-specific vocabulary. Particularly useful for proper nouns or industry terminology that standard models often miss. Context biasing is optimized for English; support for other languages is experimental.
    • Word-level timestamps.
      Generate precise start and end timestamps for each word, enabling applications like subtitle generation, audio search, and content alignment.
    • Expanded language support.
      Like Realtime, this model now supports 13 languages: English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. Non-English performance significantly outpaces competitors.
    • Noise robustness.
      Maintains transcription accuracy in challenging acoustic environments, such as factory floors, busy call centers, and field recordings.
    • Longer audio support.
      Process recordings up to 3 hours in a single request.

    Audio playground

    Test Voxtral Transcribe 2 directly in Mistral Studio. Upload up to 10 audio files, toggle diarization, choose timestamp granularity, and add context bias terms for domain-specific vocabulary. Supports .mp3, .wav, .m4a, .flac, .ogg up to 1GB each.

    Transforming voice applications

    Voxtral powers voice workflows in diverse applications and industries.

    • Meeting intelligence.
      Transcribe multilingual recordings with speaker diarization that clearly attributes who said what and when. At Voxtral's price point, annotate large volumes of meeting content at industry-leading cost efficiency.
    • Voice agents and virtual assistants.
      Build conversational AI with sub-200ms transcription latency. Connect Voxtral Realtime to your LLM and TTS pipeline for responsive voice interfaces that feel natural.
    • Contact center automation.
      Transcribe calls in real time, enabling AI systems to analyze sentiment, suggest responses, and populate CRM fields while conversations are still happening. Speaker diarization ensures clear attribution between agents and customers.
    • Media and broadcast.
      Generate live multilingual subtitles with minimal latency. Context biasing handles proper nouns and technical terminology that trip up generic transcription services.
    • Compliance and documentation.
      Monitor and transcribe interactions for regulatory compliance, with diarization providing clear speaker attribution and timestamps enabling precise audit trails.

    Both models support GDPR and HIPAA-compliant deployments through secure on-premise or private cloud setups.

    Get started

    Voxtral Mini Transcribe V2 is available now via API at $0.003 per minute. Try it now in the new Mistral Studio audio playground or in Le Chat.

    Voxtral Realtime is available via API at $0.006 per minute and as open weights on Hugging Face.

    Explore documentation on Mistral’s audio and transcription capabilities.

    We’re hiring

    If you're excited about building world-class speech AI and putting frontier models into the hands of developers everywhere, we'd love to hear from you. Apply to join our team.

    Original source Report a problem

Related vendors