Ollama Release Notes

Name: Ollama
Brand: Ollama

Follow Ollama to add their release notes to your feed!

49 release notes curated from 1 source by the Releasebot Team. Last updated: Jul 10, 2026

Get this feed:

Jul 10, 2026
Date parsed from source:
Jul 10, 2026

First seen by Releasebot:
Jul 10, 2026
Ollama

v0.32.0-rc0

Ollama adds agent UI for cmd.

cmd: agent UI (#17017)
Original source
Jul 8, 2026
Date parsed from source:
Jul 8, 2026

First seen by Releasebot:
Jul 7, 2026

Modified by Releasebot:
Jul 9, 2026
Ollama

v0.31.2

Ollama ships flash attention for older NVIDIA GPUs, iGPU vision offload, and fixes for structured output, model loading, and engines.

What's Changed

Enabled flash attention on older NVIDIA GPUs (compute capability 6.x)

iGPU can now offload vision models with padding to fit available memory

Fixed structured output for thinking models when thinking is disabled

Hardened GGUF model creation

ollama launch for Claude Code now disables telemetry by default

Fixed loading models on paths with non-UTF-8 characters

Updated the MLX and llama.cpp engines

New Contributors

@kevinpark1217 made their first contribution in #16949

Full Changelog: v0.31.1...v0.31.2
Original source
All of your release notes in one feed

Join Releasebot and get updates from Ollama and hundreds of other software products.

Create account
Get updates with:
Jul 7, 2026
Date parsed from source:
Jul 7, 2026

First seen by Releasebot:
Jul 9, 2026
Ollama

v0.31.2-rc2: llm: allow iGPU mmproj offload with fit padding (#16996)

Ollama improves multimodal GPU loading by allowing iGPU projector offload with fit padding, helping CLIP stay off CPU on supported systems while preserving user-set fit targets.

llm: allow iGPU mmproj offload with fit padding

llama.cpp's fit pass sizes text-model placement before the multimodal projector is loaded. Ollama had been avoiding that risk on non-Metal iGPUs by disabling projector offload entirely, which forces CLIP onto CPU on GB10 and Strix Halo even when the projector has ample memory available.

Let integrated GPUs use the same projector-memory check as other GPUs. When projector offload is enabled, add the estimated projector memory plus the existing 1 GiB headroom to Ollama-owned LLAMA_ARG_FIT_TARGET so fit leaves space for the later projector allocation. If Ollama/device setup already supplied a fit target, add the projector pad to it. If the user set LLAMA_ARG_FIT_TARGET explicitly, leave it exactly as provided.

Fixes #16419

review comments
Original source
Jul 6, 2026
Date parsed from source:
Jul 6, 2026

First seen by Releasebot:
Jul 8, 2026
Ollama

v0.31.2-rc1: create: harden GGUF create flows (#17062)

Ollama hardens GGUF create flows and lint handling.

create: harden GGUF create flows

lint
Original source
Jul 6, 2026
Date parsed from source:
Jul 6, 2026

First seen by Releasebot:
Jul 7, 2026
Ollama

v0.31.2-rc0

Ollama updates mlx to de7b4ed9.

mlx: update to de7b4ed9 (#17056)
Original source
Similar to Ollama with recent updates:
Jul 1, 2026
Date parsed from source:
Jul 1, 2026

First seen by Releasebot:
Jul 1, 2026
Ollama

v0.31.1

Ollama improves Gemma 4 speed on Apple Silicon, with nearly 90% faster token generation powered by multi-token prediction and automatic tuning. The update also tightens Gemma 4 model loading and refreshes the MLX and llama.cpp engines for better performance.
Faster Gemma 4 on Apple Silicon

Gemma 4 is now significantly faster in Ollama on Apple Silicon, generating tokens nearly 90% faster on average across a coding-agent benchmark by leveraging multi-token prediction (MTP). Ollama auto-tunes how many tokens to draft as it runs, so the speedup is on by default, requires no configuration, and does not change the model's output.

What's Changed

Tightened Gemma 4 MoE model loading in the MLX engine

Updated the MLX engine to the latest version, including a new small-batch matmul kernel

Updated the underlying llama.cpp engine to build 9840

Improved Gemma 4 multi-token prediction (MTP) performance

Full Changelog: v0.30.12...v0.31.1
Original source
Jun 30, 2026
Date parsed from source:
Jun 30, 2026

First seen by Releasebot:
Jun 30, 2026
Ollama

v0.31.1: mlx: tighten up gemma4 moe loading code (#16964)

Ollama expands tensor name support for quantized and non-quantized models.

This change allows .experts.gate_proj / .up_proj / .down_proj tensor names to each be used for both quantized (i.e. nvfp4 and mxfp8) and non-quantized (bf16) models. Previous to this only non-quantized models used that tensor naming scheme.
Original source
Jun 29, 2026
Date parsed from source:
Jun 29, 2026

First seen by Releasebot:
Jun 30, 2026
Ollama

v0.31.0

Ollama adds a minimum version check for Hermes Desktop.

launch: check for min version for hermes desktop (#16912)
Original source
Jun 29, 2026
Date parsed from source:
Jun 29, 2026

First seen by Releasebot:
Jun 30, 2026
Ollama

v0.30.12

Ollama improves tool call parsing, updates llama.cpp, and bumps the mlx dependency in this release candidate.

What's Changed

tools: ignore braces inside JSON strings when detecting tool call end by @aditya-786 in #16937

mlx: bump dependency by @dhiltgen in #16935

llama.cpp update by @dhiltgen in #16960

New Contributors

@aditya-786 made their first contribution in #16937

Full Changelog: v0.30.11...v0.30.12-rc0
Original source
Jun 29, 2026
Date parsed from source:
Jun 29, 2026

First seen by Releasebot:
Jun 29, 2026
Ollama

v0.30.12-rc0

Ollama ships a llama.cpp update.

llama.cpp update (#16960)
Original source
Jun 26, 2026
Date parsed from source:
Jun 26, 2026

First seen by Releasebot:
Jun 27, 2026
Ollama

v0.30.11-rc1

Ollama adds Ornith 9B renderer and parser support.

parser/renderer: add Ornith 9B renderer/parser support (#16920)
Original source
Jun 25, 2026
Date parsed from source:
Jun 25, 2026

First seen by Releasebot:
Jun 25, 2026
Ollama

v0.30.11

Ollama adds smarter launch and runtime updates, including thinking capability detection, auto-install for Claude Code and opencode, better Codex model drift detection, and fixes across Windows, Vulkan, CUDA, mlx, and generation handling.

What's Changed

launch: add thinking capability detection to opencode by @hoyyeva in #15434
launch: auto-install Claude Code by @hoyyeva in #16802
launch: auto-install opencode when missing by @hoyyeva in #16806
discover: fix inverted iGPU/dGPU Vulkan classification on Windows hybrid graphics by @Sahil170595 in #16669
mlxrunner: unify and tune speculative decoding by @jessegross in #16791
launch/codex: detect model drift when Codex App UI switches by @BruceMacD in #16864
llama: add sm_86 architecture to cuda_v13_windows preset by @anishesg in #16834
llm: size mmproj offload by projector memory by @dhiltgen in #16866
docs: document max think level by @ParthSareen in #16877
llm: preserve generation headroom for shifted prompts by @ParthSareen in #16856
llama: default qwen2.5vl window attention metadata by @dhiltgen in #16868
llm: use host Vulkan loader on Windows by @dhiltgen in #16869
mlx: update and fix CUDA JIT packaging by @dhiltgen in #16871
llm: fix ollama ps double-counting mmap'd weights on partial offload by @discobot in #16709
docs: redesign docs landing and integrations overview by @hoyyeva in #16807
server: align generate with native chat templates by @dhiltgen in #16878
jetson: add CC 87 for CUDA v13 by @dhiltgen in #16628
llama.cpp version update by @dhiltgen in #16548

New Contributors

@Sahil170595 made their first contribution in #16669
@anishesg made their first contribution in #16834
@discobot made their first contribution in #16709

Full Changelog: v0.30.10...v0.30.11-rc0
Original source
Jun 24, 2026
Date parsed from source:
Jun 24, 2026

First seen by Releasebot:
Jun 26, 2026
Ollama

v0.30.11-rc0

Ollama updates llama.cpp version.

llama.cpp version update (#16548)
Original source
Jun 18, 2026
Date parsed from source:
Jun 18, 2026

First seen by Releasebot:
Jun 18, 2026
Ollama

v0.30.10

Ollama adds Apple Silicon support for Command A and North family models with MLX, plus a llama.cpp update and build fixes.

What's Changed

Command A and North family models now run on Apple Silicon with the MLX engine

Updated the underlying llama.cpp engine to build 9672

Fixed build artifacts for MLX

Full Changelog: v0.30.9...v0.30.10
Original source
Jun 17, 2026
Date parsed from source:
Jun 17, 2026

First seen by Releasebot:
Jun 25, 2026
Ollama

v0.30.10-rc1

Ollama pins Darwin release Xcode for CI stability.

ci: pin darwin release xcode (#16788)
Original source