Mistral Inference Release Notes

Last updated: Oct 26, 2025

  • Mar 20, 2025
    • Parsed from source:
      Mar 20, 2025
    • Detected by Releasebot:
      Oct 26, 2025

    Mistral Inference by Mistral

    v1.6.0: Mistrall goes Small 3.1 with vision

    What's Changed

    • Missing new line by @theophilegervet in #234
    • Add support to Mistral Small 3.1 by @juliendenize in #239
    • Remove file refs by @juliendenize in #240
    • Release 1.6.0 by @juliendenize in #241

    New Contributors

    • @theophilegervet made their first contribution in #234
    • @juliendenize made their first contribution in #239

    Full Changelog: v1.5.0...v1.6.0

    Original source Report a problem
  • Sep 13, 2024
    • Parsed from source:
      Sep 13, 2024
    • Detected by Releasebot:
      Oct 26, 2025
    • Modified by Releasebot:
      Nov 4, 2025

    Mistral Inference by Mistral

    v1.4.0: Pixtral 👀

    Pixtral Mistral 12B-2409 is now available with multi‑modal prompts, a 128k context window, and drop‑in compatibility with Mistral 7B. Built for multilingual and code data, it includes Tekken collaboration and a changelog from v1.3.0 to v1.4.0 for the latest improvements.

    Pixtral

    Mistral models can now 👀 !

    pip install --upgrade mistral_inference

    = 1.4.0

    Download:

    from huggingface_hub import snapshot_download
    from pathlib import Path
    mistral_models_path = Path.home().joinpath('mistral_models', 'Pixtral')
    mistral_models_path.mkdir(parents=True, exist_ok=True)
    snapshot_download(repo_id = "mistralai/Pixtral-12B-2409", allow_patterns = ["params.json", "consolidated.safetensors", "tekken.json"], local_dir = mistral_models_path)

    CLI example:

    mistral-chat $HOME/mistral_models/Pixtral --instruct --max_tokens 256 --temperature 0.35

    E.g. Try out something like:
    Text prompt: What can you see on the following picture?
    [You can input zero, one or more images now.]
    Image path or url [Leave empty and press enter to finish image input]: https://picsum.photos/id/237/200/300
    Image path or url [Leave empty and press enter to finish image input]:
    I see a black dog lying on a wooden surface. The dog appears to be looking up, and its eyes are clearly visible.

    Python:

    1. Load the model
    from mistral_inference.transformer import Transformer
    from mistral_inference.generate import generate
    from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
    from mistral_common.protocol.instruct.messages import UserMessage, TextChunk, ImageURLChunk
    from mistral_common.protocol.instruct.request import ChatCompletionRequest
    
    tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
    model = Transformer.from_folder(mistral_models_path)
    
    2. Run:
    url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
    prompt = "Describe the image."
    completion_request = ChatCompletionRequest(messages = [UserMessage(content = [ImageURLChunk(image_url = url), TextChunk(text = prompt)])])
    encoded = tokenizer.encode_chat_completion(completion_request)
    images = encoded.images
    tokens = encoded.tokens
    out_tokens, _ = generate([tokens], model, images = [images], max_tokens = 256, temperature = 0.35, eos_id = tokenizer.instruct_tokenizer.tokenizer.eos_id)
    result = tokenizer.decode(out_tokens[0])
    print(result)
    

    Summary

    The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407. Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

    For more details about this model please refer to our release blog post.

    Key features

    • Released under the Apache 2 License
    • Pre-trained and instructed versions
    • Trained with a 128k context window
    • Trained on a large proportion of multilingual and code data
    • Drop-in replacement of Mistral 7B

    Model Architecture

    Mistral Nemo is a transformer model, with the following architecture choices:

    • Layers: 40
    • Dim: 5,120
    • Head dim: 128
    • Hidden dim: 14,436
    • Activation Function: SwiGLU
    • Number of heads: 32
    • Number of kv-heads: 8 (GQA)
    • Vocabulary size: 2**17 ~= 128k
    • Rotary embeddings (theta = 1M)

    Metrics

    Main Benchmarks

    Benchmark Score

    • HellaSwag (0-shot) 83.5%
    • Winogrande (0-shot) 76.8%
    • OpenBookQA (0-shot) 60.6%
    • CommonSenseQA (0-shot) 70.4%
    • TruthfulQA (0-shot) 50.3%
    • MMLU (5-shot) 68.0%
    • TriviaQA (5-shot) 73.8%
    • NaturalQuestions (5-shot) 31.2%
    Multilingual Benchmarks (MMLU)

    Language Score

    • French 62.3%
    • German 62.7%
    • Spanish 64.6%
    • Italian 61.3%
    • Portuguese 63.3%
    • Russian 59.2%
    • Chinese 59.0%
    • Japanese 59.0%

    What's Changed

    • Tekken by @patrickvonplaten in #193

    Full Changelog: v1.3.0...v1.4.0

    Original source Report a problem
  • Jul 18, 2024
    • Parsed from source:
      Jul 18, 2024
    • Detected by Releasebot:
      Oct 28, 2025
    • Modified by Releasebot:
      Oct 30, 2025

    Mistral Inference by Mistral

    v1.3.0 Mistral-Nemo

    New Mistral-Nemo-Instruct-2407 delivers an instruct-tuned LLM with 128k context, multilingual and code data, and a drop-in Nemo replacement under Apache 2.0. Built by Mistral and NVIDIA, it supports tool calls and real-world reasoning at scale.

    Welcome

    Welcome Mistral-Nemo from Mistral 🤝 NVIDIA
    Read more about Mistral-Nemo here.

    Install

    pip install mistral-inference>=1.3.0

    Download

    export NEMO_MODEL=$HOME/12B_NEMO_MODEL
    wget https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar
    mkdir -p $NEMO_MODEL
    tar -xf mistral-nemo-instruct-v0.1.tar -C $NEMO_MODEL

    Chat

    mistral-chat $HOME/NEMO_MODEL --instruct --max_tokens 1024

    or directly in Python:

    import os
    from mistral_inference.transformer import Transformer
    from mistral_inference.generate import generate
    from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
    from mistral_common.protocol.instruct.messages import UserMessage
    from mistral_common.protocol.instruct.request import ChatCompletionRequest
    
    tokenizer = MistralTokenizer.from_model("mistral-nemo")
    model = Transformer.from_folder(os.environ.get("NEMO_MODEL"))
    prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."
    completion_request = ChatCompletionRequest(messages = [UserMessage(content = prompt)])
    tokens = tokenizer.encode_chat_completion(completion_request).tokens
    out_tokens, _ = generate([tokens], model, max_tokens = 1024, temperature = 0.35, eos_id = tokenizer.instruct_tokenizer.tokenizer.eos_id)
    result = tokenizer.decode(out_tokens[0])
    print(result)
    

    Function calling:

    from mistral_common.protocol.instruct.tool_calls import Function, Tool
    from mistral_inference.transformer import Transformer
    from mistral_inference.generate import generate
    from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
    from mistral_common.protocol.instruct.messages import UserMessage
    from mistral_common.protocol.instruct.request import ChatCompletionRequest
    
    tokenizer = MistralTokenizer.from_model("mistral-nemo")
    model = Transformer.from_folder(os.environ.get("NEMO_MODEL"))
    completion_request = ChatCompletionRequest(
        tools = [
            Tool(
                function = Function(
                    name = "get_current_weather",
                    description = "Get the current weather",
                    parameters = {
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. San Francisco, CA",
                            },
                            "format": {
                                "type": "string",
                                "enum": ["celsius", "fahrenheit"],
                                "description": "The temperature unit to use. Infer this from the users location.",
                            },
                        },
                        "required": ["location", "format"],
                    },
                ),
            ),
        ],
        messages = [UserMessage(content = "What's the weather like today in Paris?")],
    )
    tokens = tokenizer.encode_chat_completion(completion_request).tokens
    out_tokens, _ = generate([tokens], model, max_tokens = 256, temperature = 0.35, eos_id = tokenizer.instruct_tokenizer.tokenizer.eos_id)
    result = tokenizer.decode(out_tokens[0])
    print(result)
    

    Summary

    The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407. Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

    For more details about this model please refer to our release blog post.

    Key features

    • Released under the Apache 2 License
    • Pre-trained and instructed versions
    • Trained with a 128k context window
    • Trained on a large proportion of multilingual and code data
    • Drop-in replacement of Mistral 7B

    Model Architecture

    Mistral Nemo is a transformer model, with the following architecture choices:

    • Layers: 40
    • Dim: 5,120
    • Head dim: 128
    • Hidden dim: 14,436
    • Activation Function: SwiGLU
    • Number of heads: 32
    • Number of kv-heads: 8 (GQA)
    • Vocabulary size: 2**17 ~= 128k
    • Rotary embeddings (theta = 1M)

    Metrics

    Main Benchmarks
    Benchmark Score
    HellaSwag (0-shot) 83.5%
    Winogrande (0-shot) 76.8%
    OpenBookQA (0-shot) 60.6%
    CommonSenseQA (0-shot) 70.4%
    TruthfulQA (0-shot) 50.3%
    MMLU (5-shot) 68.0%
    TriviaQA (5-shot) 73.8%
    NaturalQuestions (5-shot) 31.2%

    Multilingual Benchmarks (MMLU)
    Language Score
    French 62.3%
    German 62.7%
    Spanish 64.6%
    Italian 61.3%
    Portuguese 63.3%
    Russian 59.2%
    Chinese 59.0%
    Japanese 59.0%

    What's Changed

    • Tekken by @patrickvonplaten in #193

    Full Changelog: v1.2.0...v1.3.0

    Original source Report a problem
  • Jul 16, 2024
    • Parsed from source:
      Jul 16, 2024
    • Detected by Releasebot:
      Oct 28, 2025

    Mistral Inference by Mistral

    v1.2.0 Add Mamba

    Mistral unveils new 7B models Codestral-Mamba and Mathstral with guided chat demos and ready-to-use pip installs. The update bundles a full changelog from v1.1.0 to v1.2.0, GPU requirements notes, and multiple README and doc fixes plus first contributions. A clear shipped release for AI tooling fans.

    Welcome

    🐍 Codestral-Mamba and 🔢 Mathstral

    pip install mistral-inference>=1.2.0
    

    Codestral-Mamba

    pip install packaging mamba-ssm causal-conv1d transformers
    
      1. Download
      export MAMBA_CODE=$HOME/7B_MAMBA_CODE
      wget https://models.mistralcdn.com/codestral-mamba-7b-v0-1/codestral-mamba-7B-v0.1.tar
      mkdir -p $MAMBA_CODE
      tar -xf codestral-mamba-7B-v0.1.tar -C $MAMBA_CODE
      
      1. Chat
      mistral-chat $HOME/7B_MAMBA_CODE --instruct --max_tokens 256
      

    Mathstral

      1. Download
      export MATHSTRAL=$HOME/7B_MATH
      wget https://models.mistralcdn.com/mathstral-7b-v0-1/mathstral-7B-v0.1.tar
      mkdir -p $MATHSTRAL
      tar -xf mathstral-7B-v0.1.tar -C $MATHSTRAL
      
      1. Chat
      mistral-chat $HOME/7B_MATH --instruct --max_tokens 256
      

    Blogs

    What's Changed

    • • add a note about GPU requirement by @sophiamyang in #158
    • • Add codestral by @patrickvonplaten in #164
    • • Update README.md by @patrickvonplaten in #165
    • • fixing type in README.md by @didier-durand in #175
    • • Fix: typo in ModelArgs: "infered" to "inferred" by @CharlesCNorton in #174
    • • fix: typo in LoRALoaderMixin: correct "multipe" to "multiple" by @CharlesCNorton in #173
    • • fix: Correct typo in classifier.ipynb from "alborithm" to "algorithm" by @CharlesCNorton in #167
    • • Fix: typo in error message for state_dict validation by @CharlesCNorton in #172
    • • fix: Correct misspelling in ModelArgs docstring by @CharlesCNorton in #171
    • • Update README.md by @patrickvonplaten in #168
    • • fix: typo in HF_TOKEN environment variable check message by @CharlesCNorton in #179
    • • Adding Issue/Bug template. by @pandora-s-git in #178
    • • typo in ModelArgs class docstring. by @CharlesCNorton in #183
    • • Update README.md by @Simontwice in #184
    • • Add mamba by @patrickvonplaten in #187

    New Contributors

    • • @didier-durand made their first contribution in #175
    • • @CharlesCNorton made their first contribution in #174
    • • @pandora-s-git made their first contribution in #178
    • • @Simontwice made their first contribution in #184

    Full Changelog: v1.1.0...v1.2.0

    Original source Report a problem
  • May 24, 2024
    • Parsed from source:
      May 24, 2024
    • Detected by Releasebot:
      Oct 28, 2025
    • Modified by Releasebot:
      Nov 4, 2025

    Mistral Inference by Mistral

    v1.1.0 Add LoRA

    Mistral inference 1.1.0 adds LoRA model support trained with mistral-finetune and shows how to run a 7B base LoRA end-to-end with a sample script. The release tightens fine‑tuning workflows and documents the full changelog from v1.0.4 to v1.1.0.

    mistral-inference==1.1.0 supports running LoRA models that were trained with: https://github.com/mistralai/mistral-finetune
    Having trained a 7B base LoRA, you can run mistral-inference as follows:

    from mistral_inference.model import Transformer
    from mistral_inference.generate import generate
    from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
    from mistral_common.protocol.instruct.messages import UserMessage
    from mistral_common.protocol.instruct.request import ChatCompletionRequest
    
    MODEL_PATH = "path/to/downloaded/7B_base_dir"
    tokenizer = MistralTokenizer.from_file(f"{MODEL_PATH}/tokenizer.model.v3")  # change to extracted tokenizer file
    model = Transformer.from_folder(MODEL_PATH)  # change to extracted model dir
    model.load_lora("/path/to/run_lora_dir/checkpoints/checkpoint_000300/consolidated/lora.safetensors")
    
    completion_request = ChatCompletionRequest(messages = [UserMessage(content = "Explain Machine Learning to me in a nutshell.")])
    tokens, out_tokens, _ = generate(tokenizer.encode_chat_completion(completion_request).tokens, model, max_tokens = 64, temperature = 0.0, eos_id = tokenizer.instruct_tokenizer.tokenizer.eos_id)
    result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
    print(result)
    

    Full Changelog

    v1.0.4...v1.1.0

    Original source Report a problem
  • May 22, 2024
    • Parsed from source:
      May 22, 2024
    • Detected by Releasebot:
      Nov 4, 2025

    Mistral Inference by Mistral

    v1.0.4 - Mistral-inference

    Mistral-inference is the official inference library for all Mistral models with clear install and run instructions, a ready-to-use tokenizer and model loader, and a sample multi-tool chat workflow. The post signals the initial release with end-to-end guidance and a basic weather tool example.

    Mistral-inference is the official inference library for all Mistral models: 7B, 8x7B, 8x22B.
    Install with:

    pip install mistral-inference
    

    Run with:

    from mistral_inference.model import Transformer
    from mistral_inference.generate import generate
    from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
    from mistral_common.protocol.instruct.messages import UserMessage
    from mistral_common.protocol.instruct.request import ChatCompletionRequest
    
    MODEL_PATH = "/path/to/tokenizer/file"
    tokenizer = MistralTokenizer.from_file("/path/to/tokenizer/file")  # change to extracted tokenizer file
    model = Transformer.from_folder("./path/to/model/folder")  # change to extracted model dir
    
    from mistral_common.protocol.instruct.tool_calls import Function, Tool
    
    completion_request = ChatCompletionRequest(
        tools = [
            Tool(
                function = Function(
                    name = "get_current_weather",
                    description = "Get the current weather",
                    parameters = {
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. San Francisco, CA",
                            },
                            "format": {
                                "type": "string",
                                "enum": ["celsius", "fahrenheit"],
                                "description": "The temperature unit to use. Infer this from the users location.",
                            },
                        },
                        "required": ["location", "format"],
                    },
                ),
            ),
        ],
        messages = [UserMessage(content = "What's the weather like today in Paris?")],
    )
    tokens, out_tokens, _ = generate(tokenizer.encode_chat_completion(completion_request).tokens, model, max_tokens = 64, temperature = 0.0, eos_id = tokenizer.instruct_tokenizer.tokenizer.eos_id)
    result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
    print(result)
    

    Full Changelog

    initial release

    Original source Report a problem

This is the end. You've seen all the release notes in this feed!

Related products