Kimi Release Notes

Name: Kimi
Brand: Kimi

Follow Kimi to add their release notes to your feed!

70 release notes curated from 42 sources by the Releasebot Team. Last updated: Jul 21, 2026

Get this feed:

Kimi Products

Kimi Code

50 release notes

Jul 20, 2026
Date parsed from source:
Jul 20, 2026

First seen by Releasebot:
Jul 21, 2026
Kimi Code by Kimi

@moonshot-ai/[email protected]

Kimi Code improves session setup, makes /web run in a foreground-only flow, and fixes subagents so permission mode changes are respected after launch.
Patch Changes

#934 c5b6103 Thanks @tt-a1i! - Allow ACP sessions to start with configured non-OAuth model credentials instead of requiring terminal login.

#1967 ad8cc85 Thanks @sailist! - Run web servers foreground-only end to end: the /web slash command now always starts a new server, and the kimi web kill / kimi web ps subcommands are removed — foreground servers stop with Ctrl+C. kimi server kill remains as a deprecated fallback that only stops servers started by a version before 0.28.0.

#1948 f6f4192 Thanks @sailist! - Fix running subagents not observing permission mode switches made after they started.

Original source
Jul 20, 2026
Date parsed from source:
Jul 20, 2026

First seen by Releasebot:
Jul 21, 2026
Kimi Code by Kimi

@moonshot-ai/[email protected]

Kimi Code ships a streamlined web-based server workflow with foreground-only instances, shared home directories, port auto-assignment, and new manage commands for killing, listing, and rotating tokens. It also improves thinking effort behavior, permission labels, symlink handling, and cache warnings.
Minor Changes

#1826 a41a09c Thanks @sailist! - Replace the kimi server command tree with kimi web: the server runs in the foreground (the background daemon and OS-service lifecycle commands are removed), and multiple servers can now share one home directory, each taking the next free port. Manage instances with kimi web kill [server-id|all], kimi web ps, and kimi web rotate-token; any kimi server … invocation prints a deprecation notice and exits 1.

#1933 11c1683 Thanks @liruifengv! - Thinking effort persists only levels below the model's top tier (max).

Patch Changes

#1867 3086e47 Thanks @RealKai42! - Rename the stale "afk" reference to "auto" in the built-in MCP config skill guidance.

#1867 3086e47 Thanks @RealKai42! - Correct the YOLO and Auto permission mode descriptions in CLI --help output and in the ACP session mode selector shown by IDE clients.

#1867 3086e47 Thanks @RealKai42! - web: Correct the YOLO and Auto permission mode descriptions in the slash command list and the mobile permission sheet.

#1867 3086e47 Thanks @RealKai42! - Fix the YOLO and Auto permission mode descriptions to match their actual behavior: YOLO auto-approves tool actions but the agent may still ask questions, while Auto is fully autonomous and never asks.

#1867 3086e47 Thanks @RealKai42! - Correct the YOLO mode notice shown when replaying a session: tool actions are auto-approved, but the agent may still ask questions.

#1843 a3e773f Thanks @7Sageer! - Fix the web backend ignoring symbolic links when loading AGENTS.md files and reading files.

#1940 d71bf9e Thanks @wbxl2000! - web: Add a note in the model switcher that switching models or thinking effort invalidates the existing prompt cache.

Original source
All of your release notes in one feed

Join Releasebot and get updates from Kimi and hundreds of other software products.

Create account
Get updates with:
Jul 17, 2026
Date parsed from source:
Jul 17, 2026

First seen by Releasebot:
Jul 20, 2026
Kimi Code by Kimi

@moonshot-ai/[email protected]

Kimi Code adds a /copy slash command, automatic model list refresh for API key users, and a range of web and backend fixes for exports, rendering, OAuth errors, networking safety, and performance under stress.
Minor Changes

#1822 a5c568d Thanks @liruifengv! - Add the /copy slash command to copy the last assistant message to the clipboard.

#1824 bfecd01 Thanks @liruifengv! - Using an API key for Kimi coding models now also fetches the latest model list automatically.

Patch Changes

#1811 cec15e2 Thanks @liruifengv! - Fix Esc and Ctrl+C cancelling compaction instead of closing an open /btw panel.

#1806 9b49694 Thanks @sailist! - Mount the dev-only /api/v1/debug RPC surface behind the --debug-endpoints flag, exposing every scoped service for local debugging on loopback binds. Pass --debug-endpoints to kimi server run to enable it.

#1788 365ba00 Thanks @RealKai42! - Fix /export-debug-zip and kimi export overwriting the previous ZIP archive when run repeatedly on the same session; the default export filename now includes a timestamp.

#1840 fa7e4ba Thanks @7Sageer! - Fix AGENTS.md files installed as symbolic links being ignored by the web backend.

#1829 1b907b0 Thanks @RealKai42! - Fix whitespace-only thinking content rendering as a blank bullet line in the transcript, both while streaming and when replaying session history.

#1809 56a321d Thanks @sailist! - web: Fix duplicate workspace groups on Windows when the same folder is opened with different path spellings, such as a different drive-letter casing; all of the folder's sessions now list under the single merged group.

#1847 56ba8e0 Thanks @wbxl2000! - web: Fix LaTeX formulas rendering as garbled overlapping text when the web UI is accessed over the network; the server's content security policy now allows the inline styles that math and code highlighting rely on, while scripts remain strictly restricted.

#1816 44f3341 Thanks @sailist! - Harden the embedded key-value engine's durability: WAL compaction now always terminates under sustained write storms instead of chasing the tail forever, a committed write can no longer slip through a compaction rotation undetected, torn WAL tails no longer misplace later disk-mode value pointers, read-only opens never create or modify database files or compact under a live writer, corrupt index-definition files no longer force a full rebuild, stale compaction temp files are cleaned on open, and the process lock can no longer be taken over by several processes at once.

#1816 44f3341 Thanks @sailist! - Speed up the embedded key-value engine under stress: queries with skip/limit now stream candidates instead of decoding every match first, LRU eviction picks victims in O(1) instead of scanning every key, bursts of simultaneously expired TTL keys are drained within seconds, existence checks and size counting no longer read values when they only need metadata, and one oversized token can no longer poison the full-text index.

#1816 44f3341 Thanks @sailist! - Cluster readers of the embedded key-value engine now catch up incrementally by replaying only newly appended WAL frames after another process writes, instead of fully reopening the shard on every read; cross-process read latency drops by orders of magnitude at larger shard sizes, and readers still fall back to a full reopen after WAL rotation or truncation.

#1816 44f3341 Thanks @sailist! - Keep the embedded key-value engine writable when a WAL compaction rotation fails mid-way instead of wedging it until reopen, stop a rolled-back write from erasing a concurrently committed value for the same key, let the RESP server survive aborted connections, recover after oversized requests, and answer each pipelined command independently, and keep the previous full-text index intact when a postings rebuild fails.

#1808 b53e00d Thanks @wbxl2000! - Include the underlying network cause (DNS failure, refused connection, TLS or timeout errors) in OAuth connection error messages instead of a bare "fetch failed".

#1790 373abb0 Thanks @RealKai42! - Fix repeated request rejections after an interrupted model response by recording tool calls that never ran and closing them with an interrupted result.

#1791 3144972 Thanks @sailist! - Fix the built-in URL fetch tool's network safeguards: crafted domains and redirect chains can no longer reach loopback or internal network services.

#1787 319001a Thanks @sailist! - web: Remove per-workspace git repo badges and branch labels; branch, PR, and diff status remain shown for the active session.

#1838 9e12484 Thanks @wbxl2000! - web: Remember the thinking level per model, fixing an empty and unresponsive thinking picker when the active model does not support a previously stored level.

#1833 03021b6 Thanks @wbxl2000! - web: Fix queued messages silently re-sending previously uploaded files when a session is reopened.

Original source
Jul 17, 2026
Date parsed from source:
Jul 17, 2026

First seen by Releasebot:
Jul 20, 2026
Kimi Code by Kimi

0.27.0 (2026-07-17)

Kimi Code adds a /copy slash command, automatically refreshes Kimi coding model lists when using an API key, and improves error, web, and export reliability with several bug fixes and safety upgrades.
Features

Add the /copy slash command to copy the last assistant message to the clipboard.

Using an API key for Kimi coding models now also fetches the latest model list automatically.

Polish

OAuth connection errors now include the underlying network cause (DNS, refused connection, TLS, or timeout) instead of a bare "fetch failed".

Bug Fixes

Fix repeated request rejections after an interrupted model response.

Fix the built-in URL fetch tool's network safeguards: crafted domains and redirect chains can no longer reach loopback or internal network services.

web: Fix LaTeX formulas rendering as garbled overlapping text when the web UI is accessed over the network.

web: Fix queued messages silently re-sending previously uploaded files when a session is reopened.

web: Remember the thinking level per model, fixing an empty, unresponsive thinking picker when the model doesn't support the stored level.

web: Fix duplicate workspace groups on Windows when the same folder is opened with different path spellings; its sessions now list under one merged group.

Fix AGENTS.md files installed as symbolic links being ignored by the web backend.

Fix Esc and Ctrl+C cancelling compaction instead of closing an open /btw panel.

Fix whitespace-only thinking content rendering as a blank line in the transcript.

Fix /export-debug-zip and kimi export overwriting the previous ZIP on repeated runs for the same session; the default filename now includes a timestamp.

Original source
Jul 17, 2026
Date parsed from source:
Jul 17, 2026

First seen by Releasebot:
Jul 20, 2026
Kimi Code by Kimi

v0.27.0 July 17, 2026

Kimi Code adds a /copy slash command to copy the last assistant message, auto-refreshes the model list for API key users, and improves OAuth error reporting with clearer network causes. It also hardens the built-in URL fetch tool against unsafe redirects and internal network access.

/copy for the last reply

New /copy slash command copies the last assistant message to the clipboard — no more manually selecting long answers. →

Auto-refresh of the model list for API key users

When calling Kimi coding models with an API key, Kimi Code now automatically fetches the latest model list, so new models are usable without manual config changes. →

Also in this release:

OAuth connection failures now show the underlying network cause (DNS, refused connection, TLS, timeout) instead of a bare fetch failed; the built-in URL fetch tool's network safeguards are hardened so crafted domains and redirect chains can no longer reach loopback or internal network services.
Original source
Similar to Kimi with recent updates:
Jul 16, 2026
Date parsed from source:
Jul 16, 2026

First seen by Releasebot:
Jul 20, 2026
Kimi Code by Kimi

@moonshot-ai/[email protected]

Kimi Code releases expanded coder subagent tools, better model and reasoning controls, fresher model catalogs, and fixes for context, sessions, and web UX.
Minor Changes

#1776 ffaf0b9 Thanks @sailist! - Expand the coder subagent tool set to include background tasks, todo lists, plan mode, skill invocation, and nested agents, mirroring the main agent's capabilities; a subagent run also waits for its background tasks to settle before reporting completion. Applies automatically to coder subagents launched through the Agent tool.

Patch Changes

#1771 b513975 Thanks @liruifengv! - Optimize the unit formatting of the context usage display.

#1765 d531398 Thanks @RealKai42! - Fix Kimi-provider models routed through the Anthropic protocol incorrectly showing reasoning effort options. Effort choices now come only from the model's declared metadata, and the inferred fallback profile applies solely to non-Kimi Anthropic-compatible providers.

#1774 3d5d630 Thanks @RealKai42! - Honor an explicit thinking "off" on OpenAI-compatible (chat completions) providers: it used to be indistinguishable from "never configured", so the history-based auto reasoning_effort injection kept the model reasoning (and could leak the field to models that reject it). The provider now also reports the actual current thinking effort ("on"/"off") instead of recording "off" for both.

#1766 7042af3 Thanks @kermanx! - web: Fix the sidebar resize handle being covered by the chat composer background.

#1769 d1ca65e Thanks @wbxl2000! - Keep legacy migrations idempotent across multiple Kimi homes and report damaged or unmapped sessions instead of silently skipping them.

#1763 81414b6 Thanks @liruifengv! - Warn in the /model and /effort pickers that switching invalidates the existing prompt cache, and hint to use /new to avoid extra token costs.

#1773 1169a6d Thanks @RealKai42! - Replay empty thinking content verbatim instead of substituting a placeholder space on Anthropic-compatible and Kimi preserved-thinking endpoints.

#1781 09e8554 Thanks @kermanx! - Report when users stop tasks and preserve other stop reasons in model context.

#1784 d465591 Thanks @sailist! - Fix a resumed session being marked as just updated and jumping to the top of the session list without any new activity.

#1759 9e3e670 Thanks @sailist! - Fix a race where resuming a background subagent right after it was manually stopped could fail with an "already running" error.

#1782 072eed4 Thanks @sailist! - Fix the context size indicator under-reporting the model's actual context usage.

#1769 d1ca65e Thanks @wbxl2000! - Support in-process editor hosts with session lifecycle, context, MCP configuration, and cross-platform session storage APIs.

#1772 78967e2 Thanks @sailist! - web: Refresh the model catalog for all providers when opening the model picker, so newly available models always show up.

Original source
Jul 16, 2026
Date parsed from source:
Jul 16, 2026

First seen by Releasebot:
Jul 20, 2026
Kimi Code by Kimi

@moonshot-ai/[email protected]

Kimi Code releases a broad update for web, CLI, and Anthropic-compatible models, adding support for attaching any file type in chat and improving diagnostics, session behavior, and request handling. It also includes important security and reliability fixes across uploads, server access, and subagent workflows.
Minor Changes

#1731 0b790cd Thanks @sailist! - web: Allow attaching any file type in chat; files the model cannot consume inline (documents, SVG images, archives, …) are uploaded to the server and given to the model as a file path it can read on demand.

Patch Changes

#1746 918c135 Thanks @RealKai42! - Honor adaptive_thinking = false on Anthropic-compatible models by limiting thinking efforts to the legacy budget set and omitting the effort parameter from requests.

#1746 918c135 Thanks @RealKai42! - Apply official Anthropic effort profiles and a 128k output fallback for unknown models. Preserve compatible-provider thinking history across session resumes and model switches, normalize incomplete stream events, and warn on unlisted efforts.

#1746 918c135 Thanks @RealKai42! - Fix custom-named models on Anthropic-compatible providers starting new sessions with thinking effort off instead of the model default, and not showing the thinking control in ACP clients.

#1757 f0c8a10 Thanks @RealKai42! - Fix the diagnostic log missing the actual error when the CLI exits unexpectedly.

#1731 0b790cd Thanks @sailist! - Fix the Content-Security-Policy on non-loopback server binds blocking the web UI's theme bootstrap script and bundled fonts, and tighten the policy with explicit form-action, base-uri, and frame-ancestors directives.

#1758 1d7c205 Thanks @RealKai42! - Fix the CLI exiting unexpectedly when reading an image from the clipboard fails; it now falls back to pasting text.

#1753 d8ddabb Thanks @sailist! - Fix the web server bearer-token check being bypassed by percent-encoded API paths (e.g. /%61pi/v1/…), which allowed unauthenticated access to every API route.

#1758 1d7c205 Thanks @RealKai42! - Report crash telemetry for unhandled promise rejections, so exits they cause are no longer invisible.

#1753 d8ddabb Thanks @sailist! - Fix the session filesystem API following symlinks that point outside the workspace, which allowed reading, listing, creating, and downloading host files beyond the session directory through a planted symlink.

#1753 d8ddabb Thanks @sailist! - Fix sessions failing to be created when the workspace directory is given through a symlink, which the v2 engine rejected as "not a directory".

#1754 1186686 Thanks @wbxl2000! - web: Fix completed background subagents losing their final output after a session reload, and retry the output backfill when a transient fetch failure occurs.

#1755 4f99114 Thanks @kermanx! - Move the server's v1 wire schema definitions into the engine domains and the server package, removing the shared schema package from the v2 server stack with no behavior change.

#1731 0b790cd Thanks @sailist! - web: Show every attachment a user sends — files, images, and videos — as chips in the message bubble, and let files be attached by dropping them anywhere in the window.

#1744 b89d385 Thanks @wbxl2000! - web: Fix Enter not confirming modal confirmation dialogs in dev builds, and keep the dialog open with a loading state until the confirmed action (such as archiving a session) completes.

#1756 e885aec Thanks @wbxl2000! - web: Show full diagnostics for model request failures — a semantic title, the provider's raw message, and expandable details (error code, HTTP status, request ID) with copy support — instead of a bare "Connection error" toast.

#1751 df75a0f Thanks @kermanx! - web: Keep session activity indicators in sync with agent work, prevent duplicate streamed content after session activation races or LLM retries, and flush durable session events promptly.

#1754 1186686 Thanks @wbxl2000! - web: Fix a background subagent showing up as two identical rows in the agents dock panel during streaming.

Original source
Jul 16, 2026
Date parsed from source:
Jul 16, 2026

First seen by Releasebot:
Jul 20, 2026
Kimi Code by Kimi

0.26.0 (2026-07-16) Say hi to the BIIIG DAY!

Kimi Code expands the coder subagent with background tasks, todo lists, plan mode, skill invocation, and nested agents, refreshes the model catalog, improves context usage display, and fixes session, thinking, provider, and sidebar issues.
Polish

Expand the coder subagent tool set to include background tasks, todo lists, plan mode, skill invocation, and nested agents, mirroring the main agent's capabilities.

Warn in the /model and /effort pickers that switching invalidates the existing prompt cache, and hint to use /new to avoid extra token costs.

web: Refresh the model catalog for all providers when opening the model picker, so newly available models always show up.

Optimize the unit formatting of the context usage display.

Bug Fixes

Fix a resumed session being marked as just updated and jumping to the top of the session list without any new activity.

Fix the context size indicator under-reporting the model's actual context usage.

Fix Kimi-provider models routed through the Anthropic protocol incorrectly showing reasoning effort options.

Honor an explicit thinking "off" on OpenAI-compatible (chat completions) providers.

Report when users stop tasks and preserve other stop reasons in model context.

Fix a race where resuming a background subagent right after it was manually stopped could fail with an "already running" error.

Replay empty thinking content verbatim instead of substituting a placeholder space on Anthropic-compatible and Kimi preserved-thinking endpoints.

Keep legacy migrations idempotent across multiple Kimi homes and report damaged or unmapped sessions instead of silently skipping them.

web: Fix the sidebar resize handle being covered by the chat composer background.

Original source
Jul 16, 2026
Date parsed from source:
Jul 16, 2026

First seen by Releasebot:
Jul 20, 2026
Kimi Code by Kimi

0.25.0 (2026-07-16)

Kimi Code adds web chat file attachments for any file type, shows fuller diagnostics for model request failures, and applies Anthropic effort profiles with a 128k output fallback. It also delivers several web, CLI, and session reliability fixes plus important security patches.
Features

web: Attach any file type in chat — files can be dropped anywhere in the window, and sent files, images, and videos show as chips in the message bubble.

Polish

web: Show full diagnostics for model request failures.

Apply official Anthropic effort profiles and a 128k output fallback for unknown models.

Bug Fixes

Fix the web server bearer-token check being bypassed by percent-encoded API paths, which allowed unauthenticated access to every API route.

Fix the session filesystem API following symlinks that point outside the workspace, which allowed accessing host files beyond the session directory.

web: Keep session activity indicators in sync with agent work and prevent duplicate streamed content after session activation races or LLM retries.

Fix custom-named models on Anthropic-compatible providers starting new sessions with thinking effort off and not showing the thinking control in ACP clients.

Honor adaptive_thinking = false on Anthropic-compatible models by omitting the effort parameter from requests.

web: Fix the Content-Security-Policy on non-loopback server binds blocking the web UI's theme bootstrap script and bundled fonts.

Fix sessions failing to be created when the workspace directory is given through a symlink.

Fix the CLI exiting unexpectedly when reading an image from the clipboard fails; it now falls back to pasting text.

web: Fix completed background subagents losing their final output after a session reload.

web: Fix Enter not confirming modal confirmation dialogs in dev builds.

web: Fix a background subagent showing up as two identical rows in the agents dock panel during streaming.

Fix the diagnostic log missing the actual error when the CLI exits unexpectedly.

Original source
Jul 16, 2026
Date parsed from source:
Jul 16, 2026

First seen by Releasebot:
Jul 20, 2026
Kimi Code by Kimi

Kimi K3 July 16, 2026

Kimi Code releases Kimi K3, its most capable model yet, now open-sourced and available in Kimi Code. It adds up to 1M-token context, native visual understanding, and low, high, max thinking effort levels for long coding and agent tasks.

Kimi K3, our most capable model to date, is now released and open-sourced, and available in Kimi Code.

Model scale & architecture

2.8 trillion parameters, built on KDA hybrid linear attention and Attention Residuals — the first open-source model at the 3-trillion-parameter scale.

Long context & multimodal

Native visual understanding with up to 1M-token context, handling large codebases and extended coherent reasoning.

Coding & agent strengths

Excels at long-horizon programming, game dev / 3D, and knowledge work; sustains long engineering tasks with minimal human supervision across kernel optimization, GPU compilers, chip design, and scientific computing.

Benchmark positioning

A top-tier model that outperforms Opus 4.8, GPT 5.5, and other mainstream models; on challenging coding tasks such as kernel optimization, it performs close to the strongest proprietary models.

Membership

Moderato members and above can use Kimi K3; Allegretto members and above unlock the 1M context window. K3 now supports low / high / max thinking effort levels, so you can pick the right intensity for each task. →

Usage tip

Switching models invalidates the existing context cache, so consumption is higher right after switching; start a new session before using Kimi K3 for better results and lower consumption.
Original source
Jul 16, 2026
Date parsed from source:
Jul 16, 2026

First seen by Releasebot:
Jul 20, 2026
Kimi Code by Kimi

v0.26.0 July 16, 2026

Kimi Code adds coder sub-agent support for background tasks, todo lists, plan mode, skill invocation, and nested agents, making complex coding workflows more autonomous. It also improves cache-invalidation warnings and fixes context and session status reporting.

Coder sub-agent capabilities aligned with the main agent

The coder sub-agent now supports background tasks, todo lists, plan mode, skill invocation, and nested agents, so it can handle more complex coding workflows on its own. →

Cache-invalidation hints in /model and /effort

Switching the model or effort now warns that the existing prompt cache will be invalidated, and suggests using /new to avoid extra token costs. →

Also in this release: the context-size indicator no longer under-reports the model's actual context usage; a resumed session without new activity is no longer incorrectly marked as just updated.
Original source
Jul 16, 2026
Date parsed from source:
Jul 16, 2026

First seen by Releasebot:
Jul 20, 2026
Kimi Code by Kimi

v0.25.0 July 16, 2026

Kimi Code adds file attachments in web chat, with drag-and-drop support and attachment tags for files, images, and videos. It also aligns Anthropic effort settings, improves model failure diagnostics, and fixes two security issues in the web server and session filesystem API.

Attach any file in web chat

The built-in web UI now supports attaching files of any type in chat: drop a file anywhere on the window to send it, and files, images, and videos are shown as attachment tags in the message bubble. →

Aligned Anthropic effort configuration

Anthropic-compatible providers now apply the official effort configuration, with unknown models falling back to a 128k output limit and custom-named models getting correct thinking-strength controls for new sessions. →

Also in this release: model request failures now show full diagnostics; two security issues were fixed — the web server's bearer-token check could be bypassed via percent-encoded API paths, and the session filesystem API could follow symlinks outside the workspace and access host files.
Original source
Jul 16, 2026
Date parsed from source:
Jul 16, 2026

First seen by Releasebot:
Jul 17, 2026
Kimi

Kimi K3: Open Frontier Intelligence

Kimi introduces Kimi K3, its most capable open 3T-class model with native vision, a 1-million-token context window, and availability across Kimi.com, Kimi Work, Kimi Code, and the Kimi API. It also adds Widgets and Dashboard in Kimi Work for more visual, persistent workflows.
An Open 3T-Class Model

Today, we are introducing Kimi K3 — our most capable model. Kimi K3 is a 2.8T-parameter model built on our Kimi Delta Attention and Attention Residuals, with native vision capabilities and a 1-million-token context window. It is the world's first open 3T-class model, designed for frontier intelligence across long-horizon coding, knowledge work, and reasoning.

While its overall performance still trails the most powerful proprietary models, Claude Fable 5 and GPT 5.6 Sol, Kimi K3 demonstrated frontier-level performance across our evaluation suite, consistently outperforming other tested models.

Kimi K3 is available today on Kimi.com, Kimi Work, Kimi Code, and the Kimi API. At launch, Kimi K3 will use max thinking effort by default, with low- and high-effort modes to be introduced in subsequent updates. We are currently working closely with inference partners and open-source maintainers to align technical details and ensure a reliable rollout across the ecosystem. The full model weights will be released by July 27, 2026. Further details on the architecture, training, and evaluations will be released alongside the Kimi K3 technical report.

An Open 3T-Class Model

Kimi K3 is the first open model to reach 2.8 trillion parameters. It marks the latest step in Kimi's sustained push at the scaling frontier: for nine of the past twelve months, Kimi models have set the upper bound of open-model sizes.

Kimi K3 is built on Kimi Delta Attention (KDA) and Attention Residuals (AttnRes), two architectural updates designed to improve how information flows across sequence length and model depth. We have also scaled up Mixture of Experts (MoE) sparsity, effectively activating 16 out of 896 experts when paired with a Stable LatentMoE framework. Together with refined training and data recipes, these structural changes yield an approximate 2.5× improvement in overall scaling efficiency compared to Kimi K2, allowing the model to convert compute into intelligence more effectively.

Coding

Kimi K3 has strong long-horizon coding performance. Operating with minimal human oversight, it can sustain long engineering sessions, navigate massive repositories, and orchestrate terminal tools.

Kimi K3 also excels in tasks blending software engineering with visual reasoning — it leverages screenshots and visuals to optimize game dev, frontend, and CAD.

The case studies below show how Kimi K3's coding capability translates into open-ended software creation and scientific research.

Kernel Optimization

We tested the models' capability to optimize GPU kernels. Each model works independently in an identical sandbox, with up to 24 hours to profile, rewrite, and benchmark four tasks spanning AttnRes, KDA, and a 512-head-dimension MLA kernel across NVIDIA H200 and GPGPU from an alternative vendor. Kimi K3 performed competitively with Fable 5 (with fallback) and substantially outperformed Opus 4.8, GPT 5.6 Sol, and GPT 5.5.

AttnRes Kernel Optimization

Given the FLA Triton implementation of AttnRes at its production shape (96 layers, model dim 8192, 8192 tokens), the task is to make the training-side operation as fast as possible without changing the numerics. Across 15 hours of iterations nonstop, K3 designed a novel two-phase kernel algorithm, fused kernels while preserving the same numerics, and cut forward+backward time from 283.6 ms to 114.4 ms. Notably, K3 and Fable 5 (w/ potential fallback) both achieved similar performance, with K3 optimizing faster per iteration.

Claude Fable 5 was evaluated by a third party, and its results may include fallback behavior. Across most models, some trajectories include small, acceptable precision shortcuts that remain within our numerical tolerance. GPGPU denotes general-purpose GPUs used for computation beyond graphics rendering.

In the late stages of Kimi K3 development, an early version of Kimi K3 handled the majority of the team's kernel optimization works.

GPU Compiler Development

We further tested whether Kimi K3 could build a GPU programming system from scratch. Kimi K3 developed MiniTriton, a compact Triton-like compiler with its own tile-level IR layer over MLIR, optimization passes, and a PTX code-generation pipeline. Across supported roofline benchmarks, MiniTriton delivers performance on par with or better than Triton and torch.compile — beating Triton on certain workloads. Beyond microbenchmarks, MiniTriton sustains end-to-end nanoGPT training with stable convergence, the loss curve closely tracking the reference with only minor divergence — validating the full pipeline on a realistic workload. These results demonstrate that Kimi K3 can build a coherent end-to-end compiler — from DSL frontend and IR passes to PTX codegen and runtime — rather than isolated kernels; its from-scratch Tensor Core path already rivals Triton’s extensively optimized stack.

Game Dev and Digital Creation

Kimi K3 combines strong 3D reasoning, coding, and vision capabilities to turn concepts, images, and videos into fully playable interactive experiences. Kimi K3 achieves true "vision in the loop" by seamlessly iterating between code and live screenshots—instantly seeing and refining outputs.

Case 1: 3D Open World

Kimi K3 built a fully procedural browser-based 3D exploration game using Three.js WebGPU and GPU compute. It procedurally generated the environment, while using a 3D asset generation tool to create the rider and horse models, producing an expansive open world with forests, a log-cabin village, snowy mountains, and dynamic weather. External assets used: animated cowboy and horse models and terrain data.

Chip Design

As an early proof of concept, Kimi K3 designed a chip to serve a nano model built on its own architecture. In a single 48-hour autonomous run, K3 built, optimized, and verified the chip using open-source EDA tools on the Nangate 45nm library. Within 4 mm², the chip closes timing at 100 MHz and sustains over 8,700 tokens/s decode throughput in simulation, packing 1.46M standard cells, 0.277 MB of SRAM, and an INT4 MAC array with fused dequantization. A chip built by a model, for a model, reflects K3's long-horizon agentic capabilities.

Coding for Research

Kimi K3 bridges scientific literature and executable code, autonomously implementing, validating, and analyzing complex computational research workflows.

In one case, Kimi K3 completed in about two hours what would typically require one to two weeks of work by an experienced researcher. To reproduce the I–Love–Q universal relations in computational astrophysics, it reviewed and cross-validated 20+ papers, implemented the full numerical pipeline, evaluated 300+ equations of state, identified inconsistencies in published formulas, generated 3,000+ lines of Python code, and produced an interactive HTML dashboard for exploring the results.

Knowledge Work

Kimi K3 advances end-to-end knowledge work. Beyond public benchmarks, Kimi K3 (max) demonstrates consistent gains across our internal evaluations, which are derived from recurring patterns and challenges observed in real-world user-agent workflows. These consistent advantages across distinct production-oriented workflows reflect a broad improvement in Kimi K3's agentic knowledge work capabilities.

Research with Interactive Visualization

Below are a few examples of what Kimi K3 in Kimi Work can produce across financial consulting and scientific research:
Case 1: Interactive 42 years of AI ASIC industry research website
An interactive research report you can drill into: 42 years of the ASIC industry, created through 120+ rounds of recursive self-improvement. Kimi K3 transforms evidence into bespoke charts, animated diagrams, and interactive visual narratives. It pulled data via 2.8k+ web searches/fetches and 1.1k+ terminal data pulls, across 11k+ pages spanning 87 quarterly reports and 99 original PDFs.
Case 2: Fusion Industry Research
A consulting-style industry report with interactive visualizations—including timelines, Funnel Chart, Range Bar Chart, Gantt Charts, and publication-quality slides.
Case 3: GWTC-5 Gravitational-wave Analysis
An analysis of 391 gravitational-wave events using 20+ concurrent subagents, producing 7 scientific visualizations, 2 tables, and a literature synthesis from 10+ papers.

Kimi K3 is also particularly effective at producing infographic-style presentations, such as the fully editable heatmap and annual report shown below:

Widgets and Dashboard

In Kimi Work, we introduce two new features - Widgets and Dashboard - which make interactions with Kimi K3 more visual and persistent. Widgets let you generate interactive components directly within a chat, with connections to local data or external plugins for continuous updates. Dashboard brings the widgets you care about most into one persistent, personalized view organized around a topic, project, or goal.

Video Editing

Kimi K3 excels at motion design, animation, and video editing because its native multimodal architecture understands text, images, and video within the same model.

In one example, K3 created a 3Blue1Brown-style motion-graphics explainer of its own architecture, translating technical ideas into animated diagrams and transitions.

In another, Kimi K3 edited its own teaser video from 56 source clips, handling clip selection, motion-matched cuts, frame-accurate beat synchronization, audio processing, and multiple rounds of revision. A high-density short video like this would typically take an experienced editor one to two working days, or a beginner three to five.

Architecture and Infrastructure

Kimi K3 is built on Kimi Delta Attention (KDA) and Attention Residuals (AttnRes). KDA provides an efficient foundation for scaling attention, while AttnRes selectively retrieves representations across depth rather than accumulating them uniformly. Together, they form the architectural backbone of a model designed to scale well beyond the trillion-parameter regime.

Kimi K3 uses Stable LatentMoE, effectively activating 16 of 896 experts. At this level of sparsity, routing and optimization become first-order challenges. Quantile Balancing derives expert allocation directly from router-score quantiles, eliminating heuristic updates and a sensitive balancing hyperparameter, while Per-Head Muon extends Muon by optimizing attention heads independently for more adaptive learning at scale. Sigmoid Tanh Unit (SiTU) and Gated MLA improve activation control and attention selectivity respectively. Together, these advances enable stable and efficient training at the 2.8-trillion-parameter scale.

Kimi K3 applies quantization-aware training from the SFT stage onward, using MXFP4 weights with MXFP8 activations for broad hardware compatibility. To prevent expert imbalance from degrading throughput at large expert-parallel scales, we introduce a fully balanced expert-parallel training method with static shapes and no host synchronization on the critical path. Since inference efficiency likewise benefits from larger high-bandwidth communication domains, we recommend deploying Kimi K3 on supernode configurations with 64 or more accelerators. Finally, as KDA poses new challenges for conventional prefix caching, we have contributed a corresponding implementation to the vLLM community, to be released alongside the model. KDA with prefill cache allows us to serve Kimi K3 at a highly competitive token price despite its scale and long context.

More technical details will be available in our coming report.

Availability

Kimi K3 Agents: Download or update to the latest Kimi app from your mobile app store, available on iOS, Android, and HarmonyOS, or visit kimi.com.

Work with Kimi K3: Download the latest Kimi Work desktop app, version 3.1.0 or later, available for Windows and Apple silicon Macs.

Code with Kimi K3: Run Kimi Code in your terminal and select Kimi K3 using the /model command.

Build with the Kimi API: Visit the Kimi API Platform and select kimi-k3. Pricing is $0.30/MTok for cache-hit input, $3.00/MTok for cache-miss input, and $15.00/MTok for output. Powered by Mooncake's disaggregated inference architecture, the official Kimi API achieves a cache hit rate above 90% in coding workloads.

Bring Kimi to your organization: Kimi Enterprise provides enterprise-grade data privacy and member management, with complete separation between personal and organization accounts. Visit the pricing page and select “Get Kimi Enterprise” to subscribe for your team.

Limitations

Sensitivity to thinking history. K3 was trained in the preserved thinking history mode. If the agent harness fails to pass back all the historical thinking content as required, or if an ongoing session with another model is switched over to K3, generation quality may become highly unstable. We recommend using a harness with verified compatibility, such as Kimi Code, and avoiding switching to K3 in the middle of a session.

Excessive proactiveness. K3's training places particular emphasis on long-horizon, challenging tasks. As a result, when it encounters minor issues or ambiguous user intent during task execution, it may make unexpected decisions on the user's behalf. If your application requires the agent to operate within well-defined boundaries and refrain from excessive improvisation, please impose more explicit behavioral constraints on K3 in the system prompt or in AGENTS.md.

Despite being a highly competitive model overall, K3 nonetheless exhibits a noticeable gap in user experience compared with Claude Fable 5 and GPT 5.6 Sol.

Original source
Jul 15, 2026
Date parsed from source:
Jul 15, 2026

First seen by Releasebot:
Jul 20, 2026
Kimi Code by Kimi

@moonshot-ai/[email protected]

Kimi Code ships broader CLI and web updates, including smarter print-mode behavior, longer retry and subagent defaults, a built-in docs skill, better goal and budget handling, and multiple web UI fixes for sessions, streaming, mobile layout, and error reporting.
Patch Changes

#1704 38a2363 Thanks @sailist! - Align the print-mode run lifecycle across engines: print_background_mode and print_max_turns now take effect for kimi -p on the experimental engine, with the same exit / drain / steer semantics and defaults as the default engine, and kimi -p "/goal ..." now stays alive until the goal reaches a terminal state instead of exiting after the first turn.

#1704 38a2363 Thanks @sailist! - Align the subagent timeout across engines: a fixed 2-hour default, overridable with [subagent] timeout_ms in config.toml or the KIMI_SUBAGENT_TIMEOUT_MS environment variable.

#1727 286d3e7 Thanks @liruifengv! - Add a builtin check-kimi-code-docs skill that answers Kimi Code product questions (CLI usage, configuration, membership, error codes) against the official documentation with source links. It triggers automatically on product questions, or run /check-kimi-code-docs.

#1707 8490c3e Thanks @sailist! - Add the number of messages dropped during compaction retries to the session wire log's LLM request traces.

#1740 a74ab44 Thanks @sailist! - Increase the default per-step LLM retry budget from 3 to 10 attempts, so transient provider failures (429 / overload) are retried with exponential backoff for a few minutes before the turn fails. Tune with loop_control.max_retries_per_step in config.toml.

#1707 8490c3e Thanks @sailist! - Rename the dynamic tool loading model capability from select_tools to dynamically_loaded_tools, matching the model catalog vocabulary; the select_tools tool and the tool-select flag are unchanged.

#1698 722694a Thanks @chengluyu! - Enforce goal wall-clock budgets while model or tool work is still running.

#1730 72f425e Thanks @wbxl2000! - Fix tool call id collisions across turns for Gemini-protocol models, which merged separate swarm runs into a single card in the web UI.

#1695 5c0f17c Thanks @chengluyu! - Preserve active goal elapsed time across crash recovery.

#1743 481b28b Thanks @chengluyu! - Correct the guidance text shown when a goal cannot be paused or resumed.

#1692 e53cd79 Thanks @chengluyu! - Allow goals to use every configured turn before the turn budget stops further work.

#1719 b24a347 Thanks @wbxl2000! - web: Restore the AgentSwarm member list after a page refresh on the v2 backend.

#1704 38a2363 Thanks @sailist! - Fix sessions created by newer builds failing to open in older CLI builds on the same machine; new sessions are written in a compatible layout, and existing sessions are healed on first open.

#1708 ddfdfb0 Thanks @wbxl2000! - Fix sub-agent completions being signaled as session turn completions, which fired premature completion notifications, sounds, and unread markers while the main turn was still running.

#1714 20b6972 Thanks @wbxl2000! - web: Fix code block copy buttons when the web UI is served over plain HTTP.

#1643 d8d4e8c Thanks @wbxl2000! - web: Prevent long streaming responses from stalling after a tab is backgrounded.

#1715 de493ae Thanks @wbxl2000! - web: Use an upward chevron for the expand button on minimized plan review and question cards so the icon matches the direction the cards open.

#1641 b6ae0a1 Thanks @wbxl2000! - web: Show session list loading failures without discarding sessions that are still available.

#1719 b24a347 Thanks @wbxl2000! - web: Expand the AgentSwarm card by default while its subagents are still running.

#1693 7de218a Thanks @chengluyu! - web: Resume paused goals when you select Resume.

#1700 3107f96 Thanks @chengluyu! - Prevent late activity from replaced goals from changing or consuming the budget of replacement goals.

#1459 6eb8e13 Thanks @wbxl2000! - web: Fix mobile safe-area handling, including the composer floating above the on-screen keyboard on iOS, doubled landscape insets, the PWA top bar under the notch, and toasts overlapping the composer as it grows.

#1696 b781e8c Thanks @chengluyu! - Preserve final status messages when automatic goal continuations reach a budget or report a blocker.

#1722 3703d03 Thanks @sailist! - In print mode (kimi -p), keep the run alive by default while background tasks are pending and feed each completion back to the main agent as a new turn, with an effectively unbounded wait ceiling and turn cap and a 72-hour subagent timeout. Set print_background_mode = "exit" (or "drain") to restore the previous exit-after-one-turn behavior.

#1737 5d6ff02 Thanks @sailist! - In print mode (kimi -p), background Bash tasks and subagents no longer have a timeout by default — they run until they finish or the model stops them, and a foreground Bash command that times out is moved to the background without a new deadline. Interactive defaults are unchanged; tune per mode with bash_task_timeout_s under [background] or timeout_ms under [subagent] (0 = no timeout).

#1697 2bf009f Thanks @chengluyu! - Reject subagent goal requests consistently instead of starting goals they cannot finish.

#1711 9eff230 Thanks @wbxl2000! - Log failed requests, WebSocket auth rejections, shutdowns, and key operations (abort, cancel, approvals, config changes) in the web UI server so daemon problems can be diagnosed from its logs.

#1704 38a2363 Thanks @sailist! - Fix kimi server reporting the internal server package version instead of the CLI version in its metadata; the web UI settings now show the CLI version.

#1741 8a3f1ff Thanks @chengluyu! - web: Fix the session title not being generated when the first message is a skill slash command.

#1694 513f374 Thanks @chengluyu! - Reject malformed persisted goal records during session recovery.

#1704 38a2363 Thanks @sailist! - web: Show each message's actual send time in chat history after reloading a session, instead of the session creation time.

#1711 9eff230 Thanks @wbxl2000! - web: Surface server error details when actions such as stopping a session, archiving, or toggling modes fail, instead of failing silently, and log every operation failure to the console and the exported web log.

#1701 07c3632 Thanks @sailist! - Keep the workspace catalog complete and durable: creating a session registers its directory as a workspace, the server backfills missing workspaces from session history at startup, and a removed workspace no longer reappears after a restart.

Original source
Jul 15, 2026
Date parsed from source:
Jul 15, 2026

First seen by Releasebot:
Jul 20, 2026
Kimi Code by Kimi

0.24.2 (2026-07-15)

Kimi Code adds a built-in /check-kimi-code-docs skill, improves kimi -p background task handling and retries, syncs workspaces, and fixes session, web, OAuth, MCP, media, and Windows issues for a more reliable agent experience.
Features

Add a builtin /check-kimi-code-docs skill that automatically answers Kimi Code product questions with official-docs sources.

Polish

Align kimi -p behavior across engines: print_background_mode and print_max_turns now apply, and /goal runs stay alive until the goal finishes.

kimi -p now stays alive by default while background tasks are pending, with no effective wait or turn limit, and feeds each completion back to the agent. Set print_background_mode = "exit" or "drain" to restore the old exit-after-one-turn behavior.

kimi -p background tasks and subagents no longer time out by default (interactive mode is unchanged); restore limits with [background] bash_task_timeout_s or [subagent] timeout_ms.

Subagent timeout now defaults to 2 hours everywhere; override with [subagent] timeout_ms or KIMI_SUBAGENT_TIMEOUT_MS.

The per-step LLM retry limit is raised from 3 to 10 attempts, so transient provider failures (429 / overload) are retried before a turn fails; tune with loop_control.max_retries_per_step.

Workspaces now stay in sync: new sessions register automatically, missing workspaces are restored at startup, and removed ones stay removed.

kimi web now logs failed requests and key operations so daemon issues are easier to diagnose.

web: AgentSwarm cards now stay expanded while subagents are still running.

web: Minimized plan review and question cards now use an upward chevron for expand.

Bug Fixes

Prevent oversized image reads from poisoning sessions; sessions that already failed with request-too-large errors now recover automatically.

Fix session fork losing everything except the conversation log: forked sessions now carry over media attachments, plan files, background task output, and cron tasks, and a failed fork no longer leaves a broken copy behind.

web: Fix several session rendering glitches when reopening, reconnecting, or resyncing a session, including the context usage indicator dropping to 0, duplicate user message bubbles, and duplicated text in multi-step turns.

web: Fix uploaded images failing to display when connecting to the server over a non-localhost address.

web: Continue blocked goals after the user resumes them from the goal controls.

web: Fix the AgentSwarm member list disappearing after a page refresh while subagents are still running.

web: Fix the goal card disappearing after a page refresh while a session goal is active.

web: Fix the workspace picker menu sizing too narrowly for its content.

web: Recover transient subagent rate limits without surfacing them as session errors.

Fix Bash auto-detection on Windows failing when git comes from a native MSYS2 toolchain (ucrt64/clang64/clangarm64).

Fix OAuth login hanging after browser authorization when the provider configuration changes during sign-in.

Show the provider's actual rejection message instead of a misleading re-login prompt when an OAuth-managed model keeps returning 401 after a token refresh.

Fix providers without a configured base_url being rejected: anthropic/openai and other protocol providers now fall back to their official default endpoints again.

Fix MCP tools being unavailable on the first turn after session startup.

Fix pasted media and images being dropped from /skill and plugin command arguments, and when steering with Ctrl-S.

Fix empty reasoning blocks being dropped across providers, which broke multi-step tool calls.

In auto permission mode, plan exits are now marked as auto-approved instead of user-reviewed, so the agent no longer mistakes the approval for a user signal to start executing.

Fix background tasks being lost or wrongly marked as lost when resuming sessions.

Fix server shutdown sometimes leaving a stale instance file behind.

Original source

Curated by the Releasebot team

Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.

Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.

About us Our methodology

Kimi Release Notes

Kimi Products

@moonshot-ai/[email protected]

Patch Changes

@moonshot-ai/[email protected]

Minor Changes

Patch Changes

@moonshot-ai/[email protected]

Minor Changes

Patch Changes

0.27.0 (2026-07-17)

Features

Polish

Bug Fixes

v0.27.0 July 17, 2026

/copy for the last reply

Auto-refresh of the model list for API key users

Also in this release:

Similar to Kimi with recent updates:

@moonshot-ai/[email protected]

Minor Changes

Patch Changes

@moonshot-ai/[email protected]

Minor Changes

Patch Changes

0.26.0 (2026-07-16) Say hi to the BIIIG DAY!

Polish

Bug Fixes

0.25.0 (2026-07-16)

Features

Polish

Bug Fixes

Kimi K3 July 16, 2026

Model scale & architecture

Long context & multimodal

Coding & agent strengths

Benchmark positioning

Membership

Usage tip

v0.26.0 July 16, 2026

Coder sub-agent capabilities aligned with the main agent

Cache-invalidation hints in /model and /effort

v0.25.0 July 16, 2026

Attach any file in web chat

Aligned Anthropic effort configuration

Also in this release: model request failures now show full diagnostics; two security issues were fixed — the web server's bearer-token check could be bypassed via percent-encoded API paths, and the session filesystem API could follow symlinks outside the workspace and access host files.

Kimi K3: Open Frontier Intelligence

An Open 3T-Class Model

An Open 3T-Class Model

Coding

Kernel Optimization

AttnRes Kernel Optimization

GPU Compiler Development

Game Dev and Digital Creation

Case 1: 3D Open World

Chip Design

Coding for Research

Knowledge Work

Research with Interactive Visualization

Widgets and Dashboard

Video Editing

Architecture and Infrastructure

Availability

Limitations

@moonshot-ai/[email protected]

Patch Changes

0.24.2 (2026-07-15)

Features

Polish

Bug Fixes

Curated by the Releasebot team