Mistral Release Notes
82 release notes curated from 27 sources by the Releasebot Team. Last updated: Apr 1, 2026
Mistral Products
- May 4, 2026
- Date parsed from source:May 4, 2026
- First seen by Releasebot:Apr 1, 2026
- Modified by Releasebot:May 5, 2026
v1.11.2
Mistral Common adds tag v1.11.2 for a public PyPI release.
Adds tag v1.11.2 for public PyPI release
Original source - May 2026
- No date parsed from source.
- First seen by Releasebot:May 1, 2026
Mistral Medium 3.5
Mistral releases Medium 3.5 and expands Le Chat with new Work mode and remote Vibe coding agents, bringing cloud-based async coding, multi-step task handling, and stronger agentic workflows to the platform.
Introducing Mistral Medium 3.5, remote coding agents in Vibe, plus new Work mode in Le Chat for complex tasks.
Coding agents have mostly lived on your laptop. Today we're moving them to the cloud, where they run on their own, in parallel, and notify you when they're done. You can start them from the Mistral Vibe CLI or directly in Le Chat, offloading a coding task without leaving the conversation.
Powering this is Mistral Medium 3.5 in public preview, our new default model in Mistral Vibe and Le Chat, built to run for long stretches on coding and productivity work. The new Work mode in Le Chat (Preview) extends this with a powerful agent for complex, multi-step tasks like research, analysis, and cross-tool actions.
Highlights.
- Mistral Medium 3.5, a new flagship model that merges instruction-following, reasoning, and coding into a single 128B dense model. Released as open weights, under a modified MIT license.
- Strong real-world performance at a size that runs self-hosted on as few as four GPUs.
- Mistral Vibe remote agents for async coding: sessions run in the cloud, can be spawned from the CLI or Le Chat, and a local CLI session can be teleported up to the cloud.
- Start Mistral Vibe coding tasks in Le Chat. Sessions run on the same remote runtime and keep going while you step away.
- Work mode in Le Chat runs on a new agent, powered by Mistral Medium 3.5, that works through multi-step tasks, calling tools in parallel until the job is done.
Mistral Medium 3.5.
Mistral Medium 3.5 is our first flagship merged model, available in public preview. It is a dense 128B model with a 256k context window, handling instruction-following, reasoning, and coding in a single set of weights. It performs strongly in real-world use, with self-hosting possible on as few as four GPUs. Reasoning effort is now configurable per request, so the same model can answer a quick chat reply or work through a complex agentic run. We trained the vision encoder from scratch to handle variable image sizes and aspect ratios.
Mistral Medium 3.5 scores 77.6% on SWE-Bench Verified, ahead of Devstral 2 and models like Qwen3.5 397B A17B. It also has strong agentic capabilities and scores 91.4 on τ³-Telecom.
The model was built for long-horizon tasks, calling multiple tools reliably, and producing structured output that downstream code can consume. It is the model that made async cloud agents in Vibe practical to ship.
Mistral Medium 3.5 becomes the default model in Le Chat. It also replaces Devstral 2 in our coding agent, Vibe CLI.
Vibe remote agents.
From today, coding sessions can work through long tasks while you’re away. Many can run in parallel, and you stop being the bottleneck on every step the agent takes.
You can start the cloud agents from the Mistral Vibe CLI or from Le Chat. While they run, you can inspect what the agent is doing, with file diffs, tool calls, progress states, and questions surfaced as you go. Ongoing local CLI sessions can be teleported up to the cloud when you want to leave them running, with session history, task state, and approvals carrying across.
Vibe sits between the systems engineering teams already use, with humans in the loop wherever they're needed. It plugs into GitHub for code and pull requests, Linear and Jira for issues, Sentry for incidents, and apps like Slack or Teams for reporting.
Each coding session runs in an isolated sandbox, including broad edits and installs. When the work is done, the agent can open a pull request on GitHub and notify you, so you review the result instead of every keystroke that produced it.
It fits the high-volume, well-defined work that takes a developer's time without taking their judgment: module refactors, test generation, dependency upgrades, CI investigations, as well as bug fixes.
We use Workflows orchestrated in Mistral Studio to bring Mistral Vibe into Le Chat. We originally built this for our own in-house coding environment, then for our enterprise customers. Today the capability opens up to everyone, who can now launch coding tasks from the web. And without being tied to a local terminal, a developer can run several in parallel.
You can start coding sessions directly in Le Chat, so a task described in chat runs on the same remote runtime as the CLI and the web, and comes back later as a finished branch or a draft PR.
New Work mode in Le Chat (Preview).
Work mode is a powerful new agentic mode for complex tasks in Le Chat, powered by a new harness and Mistral Medium 3.5. The agent becomes the execution backend for the assistant itself, so Le Chat can read and write, use several tools at once, and work through multi-step projects until it completes what you’ve asked.
Here’s what Work mode enables you to do today.
- Cross-tool workflows: catch up across email, messages, and calendar in a single run; prepare for a meeting with attendee context, latest news, and talking points pulled from your sources.
- Research and synthesis: dive into a topic across the web, internal docs, and connected tools, then produce a structured brief or report you can edit before exporting or sending.
- Triage your inbox and draft replies; create issues in Jira from your team and customer discussions; send a summary to your team on Slack.
Sessions persist longer than a typical chat reply, so an agent can keep going across many turns, through trial-and-error, and through to completion. In Work mode, connectors are on by default rather than chosen manually, which lets the agent reach into documents, mailboxes, calendars, and other systems for the rich context it needs to take correct action.
Every action the agent takes is visible: you see each tool call and the thinking rationale. Le Chat will ask for explicit approval—based on your permissions—before proceeding with sensitive tasks like sending a message, writing a document, or modifying data.
Get started.
Mistral Medium 3.5 is available today in Mistral Vibe and Le Chat, and powers remote coding agents and Work mode in Le Chat on the Pro, Team, and Enterprise plans.
Through API, it’s priced at $1.5 per million input tokens and $7.5 per million output tokens. Open weights are on Hugging Face under a modified MIT license.
It is also available for prototyping, hosted on NVIDIA GPU-accelerated endpoints on build.nvidia.com and as a scalable containerized inference microservice, NVIDIA NIM.
Build the future of agentic systems with us.
We're hiring across research, engineering, and product to push agentic systems further. See our open roles.
Original source All of your release notes in one feed
Join Releasebot and get updates from Mistral and hundreds of other software products.
- April 2026
- No date parsed from source.
- First seen by Releasebot:Apr 29, 2026
Workflows for work that runs the business
Mistral releases Workflows in public preview, bringing durable, observable AI orchestration to Studio and Le Chat. It helps teams run production processes in Python with human-in-the-loop approvals, traceable execution, and enterprise deployment flexibility across cloud, on-prem, or hybrid environments.
Workflows is now in public preview
Today, we're releasing Workflows in public preview. Workflows is the orchestration layer for enterprise AI. It brings the durability, observability, and fault tolerance required to move AI-powered processes from proof of concept to production reliably. Organizations like ASML, ABANCA, CMA-CGM, France Travail, La Banque Postale, Moeve, and many more are already running Workflows to automate critical processes.
Enterprise teams today have access to capable models. What they lack is a way to run them reliably in production. We see this across every industry we work with. The failure modes are consistent: pipelines that run in a notebook but fail silently in production with no trace, long-running processes that can't survive a network timeout, multi-step operations that need human approval mid-execution but have no mechanism to pause and resume, and systems that offer no way to verify they're still doing what they're supposed to after deployment.
Building all of the capabilities to address these challenges is months of complex work for enterprises: the orchestration layer has to be stitched together from scratch, and the components it connects, inference, agents, connectors, observability, each come from different tools with their own interfaces and formats.
Workflows is part of Studio, so the orchestration layer and the components it orchestrates are built to work together. Once a business process is identified, developers write the workflow in Python. Every workflow can then be published to Le Chat so anyone in the organisation can trigger it. Every step is tracked and auditable in Studio. By bringing all of this together, Workflows lets your organisation go from identifying a use case to running it in production in days.
Workflows deployed in the real world
As mentioned, Mistral AI customers are already using Workflows to automate business processes and run them in production. The examples below show how durability, observability, and human-in-the-loop approvals work in practice.
Cargo release automation
Global shipping runs on paperwork. A single cargo release can involve customs declarations, dangerous goods classifications, safety inspections, and regulatory checks across multiple jurisdictions. A missed step can result in cargo delays at port and potential compliance breaches.
The operational requirements for a use case like this are: the system must survive intermittent timeouts, pause mid-execution for human review, and produce a precise account of where and why when something fails.
Using Workflows, a customer is able to automate this end to end. The workflow validates every incoming shipping document against customs rules, checks for anomalies, flags anything that needs human sign-off, waits for approval, then releases the cargo. With Workflows, the human approval step is a single line of code: wait_for_input(). The workflow pauses, waits for as long as it takes with no compute consumption, notifies the reviewer, and resumes exactly where it left off. Studio records the full execution history.
Document compliance checking
KYC reviews are manual, repetitive, and time-consuming. A single customer onboarding can require extracting identity documents, verifying them against sanctions lists and PEP databases, cross-referencing regulatory requirements across jurisdictions, and producing a structured risk assessment with supporting evidence. Done manually, this takes hours of analyst time per case.
The operational requirements here are speed and auditability. A system to automate a process like this should be fast and should document the steps and reasoning behind them for meeting regulatory requirements.
With Workflows, the entire review process only takes minutes and Studio surfaces every step as a structured timeline you can drill into at any level of detail, down to specific traces with native support for OpenTelemetry.
Customer support triage
Support teams deal with volume. Refund requests, technical issues, billing disputes, account escalations. Routing them to the right team quickly and consistently is what determines resolution time.
The operational requirement here is correctability. Automated routing will get things wrong. When it does, the team needs to see why a ticket was routed the way it was, and fix it without retraining the model.
With Workflows, incoming tickets are analysed, categorised by intent and urgency, and routed to the right downstream process automatically. Each routing decision is visible and traceable in Studio. When the categorisation is wrong, the team corrects it at the workflow level.
Why Workflows
- Durable execution. Workflows track state at every step. If a process fails, it resumes where it left off. As a result, developers can focus more on writing business logic instead of recovery logic.
- Observability. Every branch, retry, and state change is recorded in Studio. If a decision needs to be investigated months later, the full timeline is there to show how it was reached.
- Human-in-the-loop. A single line of code pauses a workflow for approval. The reviewer responds from Le Chat, a webhook, or any connected surface, and the workflow picks up where it stopped.
- Native to Studio. Workflows use the same agents and connectors as the rest of Studio. There's no separate integration work to wire them in.
- Enterprise readiness. Workspaces within Studio keep teams and projects separated, and role-based access control (RBAC) makes sure those rules are enforced consistently.
- Built for developers and business teams. Engineers write workflows as code. Business teams run them from Le Chat.
- Deployment flexibility. The control plane runs on Mistral. Workers and data processing run in your environment, right where your critical services are hosted: cloud, on-prem, or hybrid.
Under the hood
Workflows is built on Temporal's durable execution engine, the same infrastructure that powers orchestration at Netflix, Stripe, and Salesforce. We extended it for AI-specific workloads by adding streaming, payload handling, multi-tenancy, and observability that the core engine does not provide out of the box.
The deployment model is split between Mistral and your environment, and separates the control plane from the data plane. Mistral hosts the orchestration infrastructure: Temporal cluster, the Workflows API, and Studio. You deploy workers on your own Kubernetes environment using a separate Helm chart, and they connect back to the central cluster via secure credentials. Your data and business logic stay within your perimeter.
The Mistral SDK handles retry policies, tracing, timeouts, rate limiting, and human-in-the-loop through decorators and single-line configuration, so the only thing you write is the business logic itself.
Get started
The Python SDK is how developers write and run workflows. v3.0 is now publicly available and installable with a single command:
Install Workflows
uv add mistralai-workflowsTry Workflows in Studio From scratch or using our demo templates.
Read the docs
Build your first Workflow in Studio
Talk to our team
Original source - Apr 29, 2026
- Date parsed from source:Apr 29, 2026
- First seen by Releasebot:Apr 29, 2026
v1.11.1: Patch for agentic use
Mistral Common patches user-after-tool handling and relaxes from_openai for smoother framework integrations.
What's Changed
This Patch allows usage of user message after tool message. It also makes from_openai less strict to make mistral-common integrations in other frameworks smoother.
- Fix docs by @juliendenize in #216
- Allow user message after tool by @juliendenize in #218
- Make from_openai methods lenient by silently dropping unsupported fields by @juliendenize in #217
- Version 1.11.1 by @juliendenize in #220
Full Changelog: v1.11.0...v1.11.1
Original source - Apr 29, 2026
- Date parsed from source:Apr 29, 2026
- First seen by Releasebot:Apr 29, 2026
v1.11.0: Mistral Guidance
Mistral Common adds Mistral Guidance for valid reasoning traces and better tool choice in 1.11.0.
What's Changed
Mistral Guidance is out !
Make use of lark grammar to guide your model in generating valid reasoning traces with or without tool calls !
- Improve tool choice by @juliendenize in #204
- Add Mistral guidance by @juliendenize in #202
- Simplify AGENTS.md by @juliendenize in #201
- Add version_num property by @juliendenize in #203
- Update version to 1.11.0 by @juliendenize in #206
Full Changelog: v1.10.0...v1.11.0
Original source - Apr 27, 2026
- Date parsed from source:Apr 27, 2026
- First seen by Releasebot:May 1, 2026
April 27
Mistral releases Mistral Medium 3.5, a frontier multimodal model for agentic and coding use cases with adjustable reasoning.
We released Mistral Medium 3.5 (mistral-medium-3.5), our frontier-class multimodal model optimized for agentic and coding use cases, with adjustable reasoning via the reasoning_effort parameter. Released as open weights under a Modified MIT license.
MODEL RELEASED
Original source - March 2026
- No date parsed from source.
- First seen by Releasebot:Mar 27, 2026
Speaking of Voxtral
Mistral releases Voxtral TTS, its first multilingual text-to-speech model for natural, emotionally expressive voice generation. It supports 9 languages, low-latency streaming, custom voices, and testing in Mistral Studio, with API access now available.
Today we’re releasing Voxtral TTS, our first text-to-speech model with state-of-the-art performance in multilingual voice generation. The model is lightweight at 4B parameters, making Voxtral-powered agents natural, reliable, and cost-effective at scale.
Highlights
- Realistic, emotionally expressive speech in 9 popular languages with support for diverse dialects.
- Very low latency for time-to-first-audio.
- Easily adaptable to new voices.
- Available to test out in Mistral Studio.
- Enterprise-grade text-to-speech, powering critical voice agent workflows.
A natural voice generation hinges on the model’s ability to not only recite but interpret a text accurately. Contextual understanding - like neutral, happy, sarcastic, etc. - determines whether the listener considers the generation accurate or robotic. Our model excels at both contextual understanding and speaker modeling: capturing how a specific person naturally speaks. Our voice adaptation goes beyond traditional read-speech by capturing a speaker’s personality, including their natural pauses, rhythm, intonation, and emotional dexterity. With its compact size, low cost and latency, and easy adaptability, Voxtral TTS gives full control and customization for enterprises looking to own their voice AI stack.
Audio is the new UX. Create new interactions for collaboration and understanding only found in speech. Begin now in AI Studio with our Mistral Voices in American, British, and French dialects.
Listen and decide: can you tell the difference?
Our team speaks dozens of languages in multiple dialects, we understand the importance of cultural nuance and built a model that is a reflection of us. Speech generation builds trust via natural-like rhythm, emotion, and even the use of humor. That’s why with voice emulation, we focused on authenticity and emotional expressiveness.
State-of-the-art performance
Automated metrics such as word-error-rate and audio quality scores for multilingual text-to-speech systems are unable to measure naturalness of speech. What makes speech natural is extremely nuanced and requires a deep understanding of cultural differences and typical speaking patterns. Hence, comparative human evaluations performed by native speakers are crucial.
For voice agents, latency and quality are in constant tension. Human evaluations show that Voxtral TTS achieves superior naturalness compared to ElevenLabs Flash v2.5 while maintaining similar Time-to-First-Audio (TTFA). Voxtral also performs at parity with the quality of ElevenLabs v3, successfully supporting emotion-steering for more lifelike interactions.
We conducted a comparative human evaluation of Voxtral TTS and ElevenLabs v2.5 Flash in a zero-shot custom voice context. Using two recognizable voices in their native dialects for each of the 9 supported languages, 3 annotators performed a side-by-side preference test per pair on naturalness, accent adherence, and acoustic similarity to the original reference. Voxtral TTS widens the quality gap to v2.5 Flash in this zero-shot multilingual custom voice setting, highlighting the instant customizability of Voxtral TTS to any voice.
Spoken natively
Trained on a large speech dataset, Voxtral TTS is built for global application. It supports state-of-the-art performance in 9 languages: English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic.
The model was trained to adapt to a custom voice with a reference as little as 3s and capture not just the voice but also nuances like subtle accent, inflections, intonations and even disfluencies similar to those expressed in the reference. We offer some preset voice options in the API but it is simple to extend to your in-house voice library customizing it to the use-case, localize it to the language and accent, keep it neutral or more emotive, casual or formal, more natural and conversational or robotic.
The model also demonstrates zero-shot cross-lingual voice adaptation even though it’s not explicitly trained for it. For example, the model can generate English speech with a French voice prompt and English text. The resulting speech sounds natural while adopting the accent of the provided voice prompt (in this example, the generated speech has a natural French-accented English). This makes the model useful for building cascaded speech-to-speech translation systems.
Built for low-latency streaming
Latency is critical for voice agent applications. Voxtral TTS achieves a model latency of 70ms for a typical input voice sample of 10 seconds and 500 characters, with a real-time factor (RTF) of ≈9.7x. The model natively generates up to two minutes of audio, and our API handles arbitrarily long generations with smart interleaving.
Voxtral TTS architecture
The model is a transformer-based, autoregressive, flow-matching model, built on Ministral 3B. It consists of the following components:
- 3.4B parameters transformer decoder backbone
- 390M flow-matching acoustic transformer
- 300M neural audio codec (symmetric encoder-decoder)
The model takes a voice prompt (5 to 25 seconds) and a text prompt in 9 supported languages. For each audio frame, the transformer backbone predicts a semantic token, then the flow-matching transformer runs 16 function evaluations (NFEs) to produce the acoustic latent.
We developed an in-house codec, which processes audio causally using a semantic VQ (8192 vocabulary) and an acoustic FSQ (36 dim and 21 levels) latent and produces them at 12.5Hz frame rate.
Powering enterprise voice workflows
Voxtral TTS closes the loop on audio intelligence, giving enterprise voice pipelines an output layer that passes the human test. It works alongside Voxtral Transcribe for full speech-to-speech, or integrates into any existing speech-to-text and LLM stack, with cross-lingual support.
Workflows
Customer Support
Voice agents that route and resolve queries across channels with natural, brand-appropriate speech. Place Voxtral TTS into existing contact support call systems for automated spoken responses, with output that integrates into existing workflows.
Test-run the model in Mistral Studio
Experiment with Voxtral TTS directly in the Mistral Studio playground. Select one of the Mistral voices or record your own.
Get started with Voxtral TTS
Voxtral TTS is available now via API at $0.016 per 1k characters.
Try it now in Mistral Studio or in Le Chat.
A model with several reference voices is available as open weights on Hugging Face under CC BY NC 4.0 license.
Explore the model’s documentation.
Sign up for our upcoming webinar to learn more!We’re hiring!
We are building the voice layer for AI, and If this is the kind of problem you want to work on, we'd love to hear from you.
Original source - Mar 22, 2026
- Date parsed from source:Mar 22, 2026
- First seen by Releasebot:Mar 27, 2026
March 22
Mistral releases Voxtral TTS with zero-shot voice cloning, multilingual support, and real-time streaming.
We released Voxtral TTS (voxtral-tts-2603), our state-of-the-art text-to-speech model with zero-shot voice cloning, multilingual support, and real-time streaming.
MODEL RELEASED
Original source - March 2026
- No date parsed from source.
- First seen by Releasebot:Mar 17, 2026
Introducing MistralSmall 4
Mistral releases Small 4, a unified model that combines fast instructing, deep reasoning, and multimodal chat in one versatile engine. With 119B params, 256k context, and configurable reasoning, it delivers open-source availability and strong efficiency gains across inference and throughput.
Introducing Mistral Small 4
Mistral Small 4
A fast instruct model
A powerful reasoning engineer
A multimodal assistant
Today, we are announcing Mistral Small 4. This model is the next major release in the Mistral Small family. Mistral Small 4 is the first Mistral model to unify the capabilities of our flagship models, Magistral for reasoning, Pixtral for multimodal, and Devstral for agentic coding, into a single, versatile model. With Small 4, users no longer need to choose between a fast instruct model, a powerful reasoning engine, or a multimodal assistant: one model now delivers all three, with configurable reasoning effort and best-in-class efficiency.
Mistral Small 4 is released under the Apache 2.0 license, continuing our commitment to open, accessible, and customizable AI.
A new standard for multimodal, reasoning-optimized models
Mistral Small 4 is a hybrid model optimized for general chat, coding, agentic tasks, and complex reasoning. Its architecture supports both text and image inputs, making it versatile for a wide range of applications. With Mistral Small 4, we reaffirm our commitment to open-source models and are proud to join the
NVIDIA Nemotron Coalition
as a founding member, advancing collaboration and innovation in AI development.Key architectural details
- Mixture of Experts (MoE): 128 experts, with 4 active per token, enabling efficient scaling and specialization.
- 119B total parameters, with 6B active parameters per token (8B including embedding and output layers).
- 256k context window, supporting long-form interactions and document analysis.
- Configurable reasoning effort: Toggle between fast, low-latency responses and deep, reasoning-intensive outputs.
- Native multimodality: Accepts both text and image inputs, unlocking use cases from document parsing to visual analysis.
Performance highlights
- 40% reduction in end-to-end completion time (latency-optimized setup).
- 3x more requests per second (throughput-optimized setup) compared to Mistral Small 3.
Why Mistral Small 4?
Unified capabilitiesMistral Small 4 consolidates the strengths of Magistral (reasoning), Devstral (coding agents), and Mistral Small (instruct) into a single model. Whether you need a chat assistant, a research partner, or a coding agent, Small 4 adapts to your task, no need to switch between specialized models.
Reasoning on demandWith the new
reasoning_effort
parameter, users can dynamically adjust the model’s behavior:- reasoning_effort="none": Fast, lightweight responses for everyday tasks, equivalent to the same chat style of Mistral Small 3.2.
- reasoning_effort="high": Deep, step-by-step reasoning for complex problems, with equivalent verbosity to previous Magistral models.
- Minimum infrastructure: 4x NVIDIA HGX H100, 2x NVIDIA HGX H200, or 1x NVIDIA DGX B200.
- Recommended setup: 4x NVIDIA HGX H100, 4x NVIDIA HGX H200, or 2x NVIDIA DGX B200 for optimal performance.
- Mistral Small 4 is fully open source. Fine-tune it for specialized tasks or deploy it out of the box for general-purpose use. Thanks to collaboration with the community, it’s now available on vLLM, llama.cpp, SGLang, Transformers, and more.
- Delivering advanced open-source AI models requires broad optimization. Through close collaboration with NVIDIA, inference has been optimized for both open source vLLM and SGLang, ensuring efficient, high-throughput serving across deployment scenarios.
Figure: Score vs. Output Length across three benchmarks. Top: accuracy scores (higher is better). Bottom: average output length in thousands of characters (shorter is better).
Mistral Small 4 with reasoning achieves competitive scores, matching or surpassing GPT-OSS 120B on all three benchmarks, while generating significantly shorter outputs. On AA LCR, Mistral Small 4 scores 0.72 with just 1.6K characters, whereas Qwen models need 3.5-4x more output (5.8-6.1K) for comparable performance. On LiveCodeBench, Mistral Small 4 outperforms GPT-OSS 120B while producing 20% less output. This efficiency gap matters in practice: shorter outputs mean lower latency, reduced inference costs, and a better user experience.
For enterprise buyers:
- Efficiency per token directly impacts cost and scalability. Models that maintain or improve performance as responses grow longer reduce the need for manual intervention, lower operational costs, and ensure consistent quality, even for complex, high-stakes tasks like report generation, customer support, or decision-making workflows. Hybrid reasoning models deliver better value by maximizing accuracy without proportional increases in resource use, making them ideal for large-scale deployments where both performance and cost-efficiency are critical.
For technical teams and data scientists:
- Performance per token is a key metric for model selection and optimization. Models that scale efficiently allow teams to deploy solutions for longer, more nuanced tasks (e.g., detailed analytics, multi-step reasoning) without sacrificing accuracy or inflating computational costs. This means fewer trade-offs between quality and resource allocation, enabling more innovative and reliable AI-driven applications. It also simplifies fine-tuning and integration, as the model’s robustness reduces the need for constant adjustments or fallback systems.
Intended use cases
Mistral Small 4 is designed for:
- Developers: Coding automation, codebase exploration, and code agentic workflows.
- Enterprises: General chat assistants, document understanding, and multimodal analysis.
- Researchers: Math, research, and complex reasoning tasks.
Its open-source license and customizable architecture make it ideal for fine-tuning and specialization.
Availability
Mistral API and
AI StudioHugging Face Repository
Developers can prototype Mistral Small 4 for free on NVIDIA accelerated computing at
build.nvidia.com
, and for production deployment, Mistral Small 4 is available day-0 as an NVIDIA NIM, delivering optimized, containerized inference out of the box. It can also be customized with
NVIDIA NeMo
for domain-specific fine-tuning.Technical documentation for customers is available on our
AI Governance Hub
For enterprise deployments, custom fine-tuning, or on-premises solutions,
contact our team
.The future of AI is open
By unifying instruct, reasoning, and multimodal capabilities, Mistral Small 4 simplifies AI integration and empowers users to tackle a wider range of tasks with a single, adaptable tool, bringing the benefits of open source AI to real-world use cases.
Original source - Mar 16, 2026
- Date parsed from source:Mar 16, 2026
- First seen by Releasebot:Mar 17, 2026
Leanstral: Open-Source foundation for trustworthy vibe-coding
Mistral releases Leanstral, the first open-source Lean 4 code agent designed for formal verification with a 6B parameter core. It ships Apache 2.0 licensed weights, zero-setup in Mistral Vibe, a free Labs API, MCP support, and competitive benchmarks against OSS models.
First open-source code agent for Lean 4.
AI agents have proven to be highly capable tools at code generation. Yet, as we push these models to high-stakes domains, ranging from frontier research mathematics to mission-critical software, we encounter a scaling bottleneck: the human review. The time and specialized expertise required to manually verify become the primary impedance of engineering velocity.
We envision a more helpful generation of coding agents to both carry out their tasks and formally prove their implementations against strict specifications. Instead of debugging machine-generated logic, humans dictate what they want. Today, we are taking the first major step toward that vision.
Introducing Leanstral
We release Leanstral, the first open-source code agent designed for Lean 4. Lean4 is a proof assistant capable of expressing complex mathematical objects such as
perfectoid spaces
and software specifications like
properties of Rust fragments
. Unlike existing proving systems that act as wrappers around large generalist models or focus on single math problems, Leanstral is designed to be highly efficient (with 6B active parameters) and trained for operating in realistic formal repositories.Open and accessible: We release Leanstral weights under an Apache 2.0 license, in an agent mode within Mistral vibe, and through a free API endpoint. We will also release a tech report detailing our training approach, and a new evaluation suite FLTEval, to move evaluations beyond their focus on competition math.
Efficient and mighty: We use a highly sparse architecture for Leanstral, and optimise it for proof engineering tasks. Leveraging parallel inference with Lean as a perfect verifier, Leanstral is both performant and cost-efficient against existing closed-source competitors.
Upgradable via MCP: Leanstral supports arbitrary MCPs through vibe, and was specifically trained to achieve maximal performance with the frequently used lean-lsp-mcp.
Evaluation
To reflect usefulness in realistic proof engineering scenarios, we benchmark Leanstral for completing all formal proofs and correctly defining new mathematical concepts in each PR to the FLT project, instead of isolated mathematical problems. We compare Leanstral against leading coding agents (Claude Opus 4.6, Sonnet 4.6, Haiku 4.5) and open-source models (Qwen3.5 397B-A17B, Kimi-K2.5 1T-A32B, GLM5 744B-A40B).
Leanstral vs. OSS Models
Leanstral-120B-A6B demonstrates a significant efficiency advantage over its much larger open-source peers. While models like GLM5-744B-A40B and Kimi-K2.5-1T-32B struggle to scale, capping their FLTEval scores at approximately 16.6 and 20.1 respectively, Leanstral outperforms them both with just a single pass.
Even Qwen3.5-397B-A17B, the strongest OSS competitor shown, requires 4 passes to reach a score of 25.4. In contrast, Leanstral achieves a superior score of 26.3 with half that investment (pass@2) and continues to scale linearly, reaching 29.3 at the same cost level.
Leanstrall Normalized Model Cost Vs Flt Eval Score
Leanstral vs. Claude Family
Leanstral serves as a high-value alternative to the Claude suite, offering competitive performance at a fraction of the price: Leanstral pass@2 reaches a score of 26.3, beating Sonnet by 2.6 points, while costing only $36 to run, compared to Sonnet’s $549. At pass@16, Leanstral reaches a score of 31.9, comfortably beating Sonnet by 8 points. While Claude Opus 4.6 remains the leader in quality, it carries a staggering cost of $1,650, 92 times higher than running Leanstral.
In our benchmarking, we used Mistral Vibe as the scaffold with no modifications specifically for the evaluation.
Model | Cost ($) | Score
Haiku | 184 | 23.0
Sonnet | 549 | 23.7
Opus | 1,650 | 39.6
Leanstral | 18 | 21.9
Leanstral pass@2 | 36 | 26.3
Leanstral pass@4 | 72 | 29.3
Leanstral pass@8 | 145 | 31.0
Leanstral pass@16 | 290 | 31.9Case studies
Answering stackexchange posts about changes in newest Lean version
When breaking changes hit a new Lean release, migrating code can be a massive headache. We fed Leanstral
a real-world question from the Proof Assistants Stack Exchange
about a script that mysteriously stopped compiling in Lean 4.29.0-rc6 (which we did not train with due to its recency). The culprit was a rewrite (
rw
) tactic that suddenly failed to match patterns involving a simple type alias, initially written as
def T2 := List Bool
.Instead of taking a stab in the dark, Leanstral rolled up its sleeves. It successfully built test code to recreate the failing environment and diagnosed the underlying issue with definitional equality. The model correctly identified that because def creates a rigid definition requiring explicit unfolding, it was actively blocking the rw tactic from seeing the underlying structure it needed to match.
The fix it proposed was simple: just swap
def
for
abbrev
. Because
abbrev
creates a transparent alias that is immediately definitionally equal to the original type, the
rw
tactic could once again perfectly match the pattern
(L2 n).length
in the proof. Leanstral completes the job and explains the rationale to the user perfectly.Reasoning about programs
We copied over definitions in Rocq from
https://www.cs.princeton.edu/courses/archive/fall10/cos441/sf/Imp.html
and asked Leanstral to convert to Lean. It did so successfully, even implementing custom notation. Example snippet:inductive ceval : com → state → state → Prop where
| E_Skip (st : state) : ceval .CSkip st st
| E_Ass (st : state) (a1 : aexp) (n : Nat) (l : ident) (h : aeval a1 st = n) : ceval (.CAss l a1) st (update st l n)
| E_Seq (c1 c2 : com) (st st' st'' : state) (h1 : ceval c1 st st') (h2 : ceval c2 st' st'') : ceval (.CSeq c1 c2) st st''
| E_IfTrue (st st' : state) (b1 : bexp) (c1 c2 : com) (h : beval b1 st = true) (h1 : ceval c1 st st') : ceval (.CIf b1 c1 c2) st st'
| E_IfFalse (st st' : state) (b1 : bexp) (c1 c2 : com) (h : beval b1 st = false) (h1 : ceval c2 st st') : ceval (.CIf b1 c1 c2) st st'
| E_WhileEnd (b1 : bexp) (st : state) (c1 : com) (h : beval b1 st = false) : ceval (.CWhile b1 c1) st st
| E_WhileLoop (st st' st'' : state) (b1 : bexp) (c1 : com) (h1 : beval b1 st = true) (h2 : ceval c1 st st') (h3 : ceval (.CWhile b1 c1) st' st'') : ceval (.CWhile b1 c1) st st''-- Notation for command evaluation
notation: 50 c " / " st " ⇒ " st' => ceval c st st'It could also translate to Lean and then prove some properties about programs in this language when just given the Rocq statement (without proof):
-- Example command: adds 2 to variable X
def plus2 : com := .CAss "X" (.APlus (.AId "X") (.ANum 2))-- Theorem: The plus2 command correctly adds 2 to variable X
-- Intuition: If X has value n in the initial state, after executing plus2, -- X will have value n +2 in the final state
-- This specifies the behavior of the plus2 command
theorem plus2_spec (st : state) (n : Nat) (st' : state) (h1 : st "X" = n) (h2 : plus2 / st ⇒ st') : st' "X" = n + 2 := by
-- plus2 is defined as .CAss "X" (.APlus (.AId "X") (.ANum 2))
-- Use equation compiler to unfold it
change ceval (.CAss "X" (.APlus (.AId "X") (.ANum 2))) st st' at h
cases h with | E_Ass _ _ n l h => have : aeval (.APlus (.AId "X") (.ANum 2)) st = n := h
simp only [aeval] at this
rw [update]
simp [← this, h1]Demand Proof. Try Leanstral Today.
Leanstral is available today for everyone to use.
Zero-Setup in
Mistral Vibe
: We’ve integrated Leanstral directly into Mistral Vibe for immediate, zero-setup vibe coding and proving. Use
/leanstall
to start.Labs API: Access the model via our free/near-free API endpoint
labs-leanstral-2603
. We are keeping this endpoint highly accessible for a limited period to gather realistic feedback and observability data to fuel the next generation of verified code models.Own the Weights: Download the Apache 2.0 licensed model and run it on your own metal.
Documentation - Sign Up for Mistral Vibe
Original source - Mar 15, 2026
- Date parsed from source:Mar 15, 2026
- First seen by Releasebot:Mar 17, 2026
- Modified by Releasebot:Mar 18, 2026
March 15
Mistral releases Mistral Small 4, a multimodal hybrid model with 256k context, and Leanstral, an open-source code agent for Lean 4.
MODEL RELEASED
We released Mistral Small 4 (mistral-small-2603), a hybrid model unifying instruct, reasoning, and coding in a single multimodal model with a 256k context window.
We released Leanstral (labs-leanstral-2603), our first open-source code agent designed for Lean 4 formal proof engineering.
- Mar 15, 2026
- Date parsed from source:Mar 15, 2026
- First seen by Releasebot:Mar 17, 2026
March 15
Mistral releases Leanstral, an open-source code agent for Lean 4 formal proof engineering.
We released Leanstral (labs-leanstral-2603), our first open-source code agent designed for Lean 4 formal proof engineering.
MODEL RELEASED
Original source - Mar 13, 2026
- Date parsed from source:Mar 13, 2026
- First seen by Releasebot:Mar 13, 2026
v1.10.0: Tokenizer v15, Reasoning Effort and Python 3.14
Mistral unveils Version 1.10.0 with new capabilities and improvements such as Python 3.14 support, speech request addition, and strict function calling, plus v15. Tests now use mocked HTTP responses and several contributors are noted. Full changelog covers v1.9.1 to v1.10.0.
What's Changed
- Allow System Prompt with Audio for v13 by @juliendenize in #184
- test_audio: Replace live network calls in test_from_url with mocked HTTP responses by @framsouza in #188
- fix: typo in serve command help text by @framsouza in #189
- Add Python 3.14 support by @juliendenize in #195
- test: mock remaining network call in test_encode_invalid_audio_url_chunk by @abdelhadi703 in #192
- [Speech Request] Add speech request by @patrickvonplaten in #196
- Add strict function calling support by @juliendenize in #197
- Add v15 by @juliendenize in #199
- Version 1.10.0 by @juliendenize in #200
New Contributors
- @framsouza made their first contribution in #188
- @abdelhadi703 made their first contribution in #192
Full Changelog: v1.9.1...v1.10.0
Original source - Mar 11, 2026
- Date parsed from source:Mar 11, 2026
- First seen by Releasebot:Mar 18, 2026
March 11
Mistral releases Moderation 2603 with Custom Guardrails for Agents and Conversations and per-request guardrails on conversations and chat completions.
MODEL RELEASED
We released Mistral Moderation 2603 (mistral-moderation-2603).
We added
Custom Guardrails
support for Agents and Conversations.API UPDATED
Guardrails can now be configured directly on an Agent via the
guardrails
parameter.Guardrails can be passed per-request on
POST /v1/conversations
using the
guardrails
field.Guardrails can be passed per-request on
Original source
POST /v1/chat/completions
using the
guardrails
field. - February 2026
- No date parsed from source.
- First seen by Releasebot:Feb 25, 2026
Voxtral transcribes at the speed of sound.
Voxtral launches Transcribe 2 with Mini Transcribe V2 and Realtime, plus an audio playground in Mistral Studio. It offers diarization, multilingual support, low latency sub-200ms, open weights, and edge-ready deployment.
Highlights
- Voxtral Mini Transcribe V2: State-of-the-art transcription with speaker diarization, context biasing, and word-level timestamps in 13 languages.
- Voxtral Realtime: Purpose-built for live transcription with latency configurable down to sub-200ms, enabling voice agents and real-time applications.
- Best-in-class efficiency: Industry-leading accuracy at a fraction of the cost, with Voxtral Mini Transcribe V2 achieving the lowest word error rate, at the lowest price point.
- Open weights: Voxtral Realtime ships under Apache 2.0, deployable on edge for privacy-first applications.
Voxtral Realtime
Voxtral Realtime is purpose-built for applications where latency matters. Unlike approaches that adapt offline models by processing audio in chunks, Realtime uses a novel streaming architecture that transcribes audio as it arrives. The model delivers transcriptions with delay configurable down to sub-200ms, unlocking a new class of voice-first applications.
Word error rate (lower is better) across languages in the FLEURS transcription benchmark.
At 2.4 seconds delay, ideal for subtitling, Realtime matches Voxtral Mini Transcribe V2, our latest batch model. At 480ms delay, it stays within 1-2% word error rate, enabling voice agents with near-offline accuracy.
The model is natively multilingual, achieving strong transcription performance in 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. With a 4B parameter footprint, it runs efficiently on edge devices, ensuring privacy and security for sensitive deployments.
We’re releasing the model weights under Apache 2.0 on the Hugging Face Hub.
Voxtral Mini Transcribe V2
Voxtral Mini Transcribe V2 delivers significant improvements in transcription and diarization quality across languages and domains. At approximately 4% word error rate on FLEURS and $0.003/min, Voxtral offers the best price-performance of any transcription API. It outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, Assembly Universal, and Deepgram Nova on accuracy, and processes audio approximately 3x faster than ElevenLabs’ Scribe v2 while matching on quality at one-fifth the cost.
Model features
Voxtral Mini Transcribe 2 introduces key capabilities.
- Speaker diarization.
Generate transcriptions with speaker labels and precise start/end times. Ideal for meeting transcription, interview analysis, and multi-party call processing. Note: with overlapping speech, the model typically transcribes one speaker. - Context biasing.
Provide up to 100 words or phrases to guide the model toward correct spellings of names, technical terms, or domain-specific vocabulary. Particularly useful for proper nouns or industry terminology that standard models often miss. Context biasing is optimized for English; support for other languages is experimental. - Word-level timestamps.
Generate precise start and end timestamps for each word, enabling applications like subtitle generation, audio search, and content alignment. - Expanded language support.
Like Realtime, this model now supports 13 languages: English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. Non-English performance significantly outpaces competitors. - Noise robustness.
Maintains transcription accuracy in challenging acoustic environments, such as factory floors, busy call centers, and field recordings. - Longer audio support.
Process recordings up to 3 hours in a single request.
Audio playground
Test Voxtral Transcribe 2 directly in Mistral Studio. Upload up to 10 audio files, toggle diarization, choose timestamp granularity, and add context bias terms for domain-specific vocabulary. Supports .mp3, .wav, .m4a, .flac, .ogg up to 1GB each.
Transforming voice applications
Voxtral powers voice workflows in diverse applications and industries.
- Meeting intelligence.
Transcribe multilingual recordings with speaker diarization that clearly attributes who said what and when. At Voxtral's price point, annotate large volumes of meeting content at industry-leading cost efficiency. - Voice agents and virtual assistants.
Build conversational AI with sub-200ms transcription latency. Connect Voxtral Realtime to your LLM and TTS pipeline for responsive voice interfaces that feel natural. - Contact center automation.
Transcribe calls in real time, enabling AI systems to analyze sentiment, suggest responses, and populate CRM fields while conversations are still happening. Speaker diarization ensures clear attribution between agents and customers. - Media and broadcast.
Generate live multilingual subtitles with minimal latency. Context biasing handles proper nouns and technical terminology that trip up generic transcription services. - Compliance and documentation.
Monitor and transcribe interactions for regulatory compliance, with diarization providing clear speaker attribution and timestamps enabling precise audit trails.
Both models support GDPR and HIPAA-compliant deployments through secure on-premise or private cloud setups.
Get started
Voxtral Mini Transcribe V2 is available now via API at $0.003 per minute. Try it now in the new Mistral Studio audio playground or in Le Chat.
Voxtral Realtime is available via API at $0.006 per minute and as open weights on Hugging Face.
Explore documentation on Mistral’s audio and transcription capabilities.
We’re hiring
If you're excited about building world-class speech AI and putting frontier models into the hands of developers everywhere, we'd love to hear from you. Apply to join our team.
Original source
Curated by the Releasebot team
Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.
Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.
Similar to Mistral with recent updates:
- Perplexity release notes24 release notes · Latest May 11, 2026
- Anthropic release notes570 release notes · Latest May 20, 2026
- xAI release notes72 release notes · Latest May 18, 2026
- OpenAI release notes675 release notes · Latest May 19, 2026
- Cursor release notes82 release notes · Latest May 19, 2026
- Google release notes1388 release notes · Latest May 19, 2026