Mistral Inference Release Notes

Last updated: Oct 26, 2025

Mar 20, 2025
- Parsed from source:
  Mar 20, 2025
- Detected by Releasebot:
  Oct 26, 2025
Mistral Inference by Mistral

v1.6.0: Mistrall goes Small 3.1 with vision
What's Changed

Missing new line by @theophilegervet in #234

Add support to Mistral Small 3.1 by @juliendenize in #239

Remove file refs by @juliendenize in #240

Release 1.6.0 by @juliendenize in #241

New Contributors

@theophilegervet made their first contribution in #234

@juliendenize made their first contribution in #239

Full Changelog: v1.5.0...v1.6.0
Original source Report a problem

Sep 13, 2024

Parsed from source:
Sep 13, 2024
Detected by Releasebot:
Oct 26, 2025
Modified by Releasebot:
Dec 28, 2025

v1.4.0: Pixtral 👀

Pixtral Mistral models are now accessible via an upgradeable mistral_inference install with a HuggingFace download flow. The guide includes CLI and Python usage plus image input prompts to demo multi modal capabilities. A release enabling Pixtral 12B-2409 model usage.

Pixtral

Mistral models can now 👀 !

pip install --upgrade mistral_inference
>= 1.4.0

Download

from huggingface_hub import snapshot_download
from pathlib import Path
mistral_models_path = Path.home().joinpath('mistral_models', 'Pixtral')
mistral_models_path.mkdir(parents=True, exist_ok=True)
snapshot_download(repo_id = "mistralai/Pixtral-12B-2409", allow_patterns = ["params.json", "consolidated.safetensors", "tekken.json"], local_dir = mistral_models_path)

CLI example

mistral-chat $HOME/mistral_models/Pixtral --instruct --max_tokens 256 --temperature 0.35

E.g.

Try out something like:
Text prompt: What can you see on the following picture?
[You can input zero, one or more images now.]
Image path or url [Leave empty and press enter to finish image input]: https://picsum.photos/id/237/200/300
Image path or url [Leave empty and press enter to finish image input]:
I see a black dog lying on a wooden surface. The dog appears to be looking up, and its eyes are clearly visible.

Python

1. Load the model

from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage, TextChunk, ImageURLChunk
from mistral_common.protocol.instruct.request import ChatCompletionRequest

tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
model = Transformer.from_folder(mistral_models_path)

1. Run:

url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
prompt = "Describe the image."
completion_request = ChatCompletionRequest(messages = [UserMessage(content = [ImageURLChunk(image_url = url), TextChunk(text = prompt)])])
encoded = tokenizer.encode_chat_completion(completion_request)
images = encoded.images
tokens = encoded.tokens
out_tokens, _ = generate([tokens], model, images = [images], max_tokens = 256, temperature = 0.35, eos_id = tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])
print(result)

Assets

Source code (zip) Mar 20
Source code (tar.gz) Mar 20

Original source Report a problem

Jul 18, 2024

Parsed from source:
Jul 18, 2024
Detected by Releasebot:
Oct 28, 2025
Modified by Releasebot:
Dec 17, 2025

Mistral Inference by Mistral

v1.3.0 Mistral-Nemo

Mistral‑Nemo‑Instruct‑2407 launches as a jointly trained instruct model from Mistral and NVIDIA. It features a 128k context window, 40 layers, multilingual training, and a drop‑in replacement for Mistral 7B, with Apache 2.0 license and ready pipelines.

Welcome

Welcome Mistral-Nemo from Mistral 🤝 NVIDIA
Read more about Mistral-Nemo here.

Install

pip install mistral-inference>=1.3.0

Download

export NEMO_MODEL=$HOME/12B_NEMO_MODEL
wget https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar
mkdir -p $NEMO_MODEL
tar -xf mistral-nemo-instruct-v0.1.tar -C $NEMO_MODEL

Chat

mistral-chat $HOME/NEMO_MODEL --instruct --max_tokens 1024

or directly in Python:

import os
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest

tokenizer = MistralTokenizer.from_model("mistral-nemo")
model = Transformer.from_folder(os.environ.get("NEMO_MODEL"))

prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."
completion_request = ChatCompletionRequest(messages = [UserMessage(content = prompt)])
tokens, out_tokens, _ = generate(tokenizer.encode_chat_completion(completion_request).tokens, model, max_tokens = 1024, temperature = 0.35, eos_id = tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])
print(result)

Function calling

from mistral_common.protocol.instruct.tool_calls import Function, Tool
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest

tokenizer = MistralTokenizer.from_model("mistral-nemo")
model = Transformer.from_folder(os.environ.get("NEMO_MODEL"))
completion_request = ChatCompletionRequest(
    tools = [
        Tool(
            function = Function(
                name = "get_current_weather",
                description = "Get the current weather",
                parameters = {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "The temperature unit to use. Infer this from the users location.",
                        },
                    },
                    "required": ["location", "format"],
                },
            ),
            name = "get_current_weather",
        ),
    ],
    messages = [UserMessage(content = "What's the weather like today in Paris?")],
)
tokens, out_tokens, _ = generate(tokenizer.encode_chat_completion(completion_request).tokens, model, max_tokens = 256, temperature = 0.35, eos_id = tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])
print(result)

Summary

The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the
Mistral-Nemo-Base-2407. Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

For more details about this model please refer to our release blog post.

Key features

Released under the Apache 2 License
Pre-trained and instructed versions
Trained with a 128k context window
Trained on a large proportion of multilingual and code data
Drop-in replacement of Mistral 7B

Model Architecture

Mistral Nemo is a transformer model, with the following architecture choices:

Layers: 40
Dim: 5,120
Head dim: 128
Hidden dim: 14,436
Activation Function: SwiGLU
Number of heads: 32
Number of kv-heads: 8 (GQA)
Vocabulary size: 2**17 ~= 128k
Rotary embeddings (theta = 1M)

Metrics

Main Benchmarks
Benchmark Score
HellaSwag (0-shot) 83.5%
Winogrande (0-shot) 76.8%
OpenBookQA (0-shot) 60.6%
CommonSenseQA (0-shot) 70.4%
TruthfulQA (0-shot) 50.3%
MMLU (5-shot) 68.0%
TriviaQA (5-shot) 73.8%
NaturalQuestions (5-shot) 31.2%

Multilingual Benchmarks (MMLU)
Language Score
French 62.3%
German 62.7%
Spanish 64.6%
Italian 61.3%
Portuguese 63.3%
Russian 59.2%
Chinese 59.0%
Japanese 59.0%

What's Changed

Tekken by @patrickvonplaten in #193

Full Changelog: v1.2.0...v1.3.0

Original source Report a problem

Jul 16, 2024
- Parsed from source:
  Jul 16, 2024
- Detected by Releasebot:
  Oct 28, 2025
- Modified by Releasebot:
  Nov 16, 2025
Mistral Inference by Mistral

v1.2.0 Add Mamba

Mistral introduces Codestral-Mamba and Mathstral 7B models with ready to run chat demos and updated setup steps. The release highlights README fixes, typo corrections, and new contributors as it advances from v1.1.0 to v1.2.0.
Welcome 🐍 Codestral-Mamba and 🔢 Mathstral

pip install mistral-inference>=1.2.0

Codestral-Mamba

pip install packaging mamba-ssm causal-conv1d transformers
1. Download
export MAMBA_CODE=$HOME/7B_MAMBA_CODE
wget https://models.mistralcdn.com/codestral-mamba-7b-v0-1/codestral-mamba-7B-v0.1.tar
mkdir -p $MAMBA_CODE
tar -xf codestral-mamba-7B-v0.1.tar -C $MAMBA_CODE
2. Chat
mistral-chat $HOME/7B_MAMBA_CODE --instruct --max_tokens 256

Mathstral
1. Download
export MATHSTRAL=$HOME/7B_MATH
wget https://models.mistralcdn.com/mathstral-7b-v0-1/mathstral-7B-v0.1.tar
mkdir -p $MATHSTRAL
tar -xf mathstral-7B-v0.1.tar -C $MATHSTRAL
2. Chat
mistral-chat $HOME/7B_MATH --instruct --max_tokens 256

Blogs:

Blog Codestral Mamba 7B: https://mistral.ai/news/codestral-mamba/

Blog Mathstral 7B: https://mistral.ai/news/mathstral/

What's Changed

add a note about GPU requirement by @sophiamyang in #158

Add codestral by @patrickvonplaten in #164

Update README.md by @patrickvonplaten in #165

fixing type in README.md by @didier-durand in #175

Fix: typo in ModelArgs: "infered" to "inferred" by @CharlesCNorton in #174

fix: typo in LoRALoaderMixin: correct "multipe" to "multiple" by @CharlesCNorton in #173

fix: Correct typo in classifier.ipynb from "alborithm" to "algorithm" by @CharlesCNorton in #167

Fix: typo in error message for state_dict validation by @CharlesCNorton in #172

fix: Correct misspelling in ModelArgs docstring by @CharlesCNorton in #171

Update README.md by @patrickvonplaten in #168

fix: typo in HF_TOKEN environment variable check message by @CharlesCNorton in #179

Adding Issue/Bug template. by @pandora-s-git in #178

typo in ModelArgs class docstring. by @CharlesCNorton in #183

Update README.md by @Simontwice in #184

Add mamba by @patrickvonplaten in #187

New Contributors

@didier-durand made their first contribution in #175

@CharlesCNorton made their first contribution in #174

@pandora-s-git made their first contribution in #178

@Simontwice made their first contribution in #184

Full Changelog: v1.1.0...v1.2.0
Original source Report a problem

May 24, 2024

Parsed from source:
May 24, 2024
Detected by Releasebot:
Oct 28, 2025
Modified by Releasebot:
Dec 11, 2025

Mistral Inference by Mistral

v1.1.0 Add LoRA

Mistral inference 1.1.0 adds support for running LoRA models trained with mistral-finetune, enabling end users to load a 7B base LoRA and execute inference via a streamlined Python API. The note walks through tokenizer and model setup along with a sample generation flow.

mistral-inference==1.1.0 supports running LoRA models that were trained with: https://github.com/mistralai/mistral-finetune

Having trained a 7B base LoRA, you can run mistral-inference as follows:

from mistral_inference.model import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest

MODEL_PATH = "path/to/downloaded/7B_base_dir"
tokenizer = MistralTokenizer.from_file(f"{MODEL_PATH}/tokenizer.model.v3")
# change to extracted tokenizer file
model = Transformer.from_folder(MODEL_PATH)
# change to extracted model dir
model.load_lora("/path/to/run_lora_dir/checkpoints/checkpoint_000300/consolidated/lora.safetensors")
completion_request = ChatCompletionRequest(messages = [UserMessage(content = "Explain Machine Learning to me in a nutshell.")])
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate(tokens, model, max_tokens = 64, temperature = 0.0, eos_id = tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)

Assets 2

5 people reacted

Original source Report a problem

May 22, 2024

Parsed from source:
May 22, 2024
Detected by Releasebot:
Nov 4, 2025
Modified by Releasebot:
Dec 28, 2025

Mistral Inference by Mistral

v1.0.4 - Mistral-inference

Mistral-inference is the official inference library for Mistral models 7B, 8x7B and 8x22B with simple install and run steps. This marks a ready‑to‑use tool for developers to deploy models in apps.

Mistral-inference is the official inference library for all Mistral models: 7B, 8x7B, 8x22B.

Install with:

pip install mistral-inference

Run with:

from mistral_inference.model import Transformer
from mistral_inference import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest

completion_request = ChatCompletionRequest(tools = [Tool(function = Function(name = "get_current_weather", description = "Get the current weather", parameters = {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "format": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the users location."}}, "required": ["location", "format"]})})], messages = [UserMessage(content = "What's the weather like today in Paris?")])
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens = 64, temperature = 0.0, eos_id = tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])
print(result)

Assets 2

Original source Report a problem

This is the end. You've seen all the release notes in this feed!

Mistral Inference Release Notes

v1.6.0: Mistrall goes Small 3.1 with vision

What's Changed

New Contributors

v1.4.0: Pixtral 👀

Pixtral

Download

CLI example

E.g.

Python

Assets

v1.3.0 Mistral-Nemo

Welcome

Install

Download

Chat

Function calling

Summary

Key features

Model Architecture

Metrics

What's Changed

v1.2.0 Add Mamba

Welcome 🐍 Codestral-Mamba and 🔢 Mathstral

Codestral-Mamba

Mathstral

Blogs:

What's Changed

New Contributors

Full Changelog: v1.1.0...v1.2.0

v1.1.0 Add LoRA

mistral-inference==1.1.0 supports running LoRA models that were trained with: https://github.com/mistralai/mistral-finetune

v1.0.4 - Mistral-inference

Mistral-inference is the official inference library for all Mistral models: 7B, 8x7B, 8x22B.

Run with:

Assets 2

Related products