Cloudflare AI Release Notes

Last updated: Apr 5, 2026

  • Apr 4, 2026
    • Date parsed from source:
      Apr 4, 2026
    • First seen by Releasebot:
      Apr 5, 2026
    Cloudflare logo

    Cloudflare AI by Cloudflare

    Google Gemma 4 26B A4B now available on Workers AI

    Cloudflare AI brings Google’s Gemma 4 26B A4B to Workers AI, adding a fast Mixture-of-Experts model with a 256,000 token context window, built-in thinking mode, vision understanding, function calling, multilingual support, and coding capabilities.

    We are partnering with Google to bring @cf/google/gemma-4-26b-a4b-it to Workers AI. Gemma 4 26B A4B is a Mixture-of-Experts (MoE) model built from Gemini 3 research, with 26B total parameters and only 4B active per forward pass. By activating a small subset of parameters during inference, the model runs almost as fast as a 4B-parameter model while delivering the quality of a much larger one.

    Gemma 4 is Google's most capable family of open models, designed to maximize intelligence-per-parameter.

    Key capabilities

    • Mixture-of-Experts architecture with 8 active experts out of 128 total (plus 1 shared expert), delivering frontier-level performance at a fraction of the compute cost of dense models
    • 256,000 token context window for retaining full conversation history, tool definitions, and long documents across extended sessions
    • Built-in thinking mode that lets the model reason step-by-step before answering, improving accuracy on complex tasks
    • Vision understanding for object detection, document and PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), and handwriting recognition, with support for variable aspect ratios and resolutions
    • Function calling with native support for structured tool use, enabling agentic workflows and multi-step planning
    • Multilingual with out-of-the-box support for 35+ languages, pre-trained on 140+ languages
    • Coding for code generation, completion, and correction

    Use Gemma 4 26B A4B through the Workers AI binding (env.AI.run()), the REST API at /run or /v1/chat/completions, or the OpenAI-compatible endpoint.

    For more information, refer to the Gemma 4 26B A4B model page.

    Original source Report a problem
  • Apr 2, 2026
    • Date parsed from source:
      Apr 2, 2026
    • First seen by Releasebot:
      Apr 3, 2026
    Cloudflare logo

    Cloudflare AI by Cloudflare

    Automatically retry on upstream provider failures on AI Gateway

    Cloudflare AI adds automatic retries in AI Gateway, letting requests retry at the gateway level when upstream providers return errors. The feature supports configurable retry counts, delays and backoff strategies, with per-request overrides and no client-side changes needed.

    AI Gateway now supports automatic retries at the gateway level. When an upstream provider returns an error, your gateway retries the request based on the retry policy you configure, without requiring any client-side changes.

    You can configure the retry count (up to 5 attempts), the delay between retries (from 100ms to 5 seconds), and the backoff strategy (Constant, Linear, or Exponential). These defaults apply to all requests through the gateway, and per-request headers can override them.

    This is particularly useful when you do not control the client making the request and cannot implement retry logic on the caller side. For more complex failover scenarios such as failing across different providers use Dynamic Routing.

    For more information, refer to Manage gateways.

    Original source Report a problem
  • All of your release notes in one feed

    Join Releasebot and get updates from Cloudflare and hundreds of other software products.

  • Apr 1, 2026
    • Date parsed from source:
      Apr 1, 2026
    • First seen by Releasebot:
      Apr 1, 2026
    • Modified by Releasebot:
      Apr 3, 2026
    Cloudflare logo

    Cloudflare AI by Cloudflare

    Create, manage, search AI Search instances with Wrangler CLI

    Cloudflare AI adds a wrangler ai-search command namespace for managing AI Search instances from the CLI, with create, list, get, update, delete, search, and stats commands plus JSON output for scripting and agents.

    AI Search supports a wrangler ai-search command namespace. Use it to manage instances from the command line.

    The following commands are available:

    Command | Description

    • wrangler ai-search create | Create a new instance with an interactive wizard
    • wrangler ai-search list | List all instances in your account
    • wrangler ai-search get | Get details of a specific instance
    • wrangler ai-search update | Update the configuration of an instance
    • wrangler ai-search delete | Delete an instance
    • wrangler ai-search search | Run a search query against an instance
    • wrangler ai-search stats | Get usage statistics for an instance

    The create command guides you through setup, choosing a name, source type (r2 or web), and data source. You can also pass all options as flags for non-interactive use:

    Use wrangler ai-search search to query an instance directly from the CLI:

    All commands support --json for structured output that scripts and AI agents can parse directly.

    For full usage details, refer to the Wrangler commands documentation.

    Original source Report a problem
  • Mar 24, 2026
    • Date parsed from source:
      Mar 24, 2026
    • First seen by Releasebot:
      Mar 25, 2026
    Cloudflare logo

    Cloudflare AI by Cloudflare

    Advanced WAF customization for AI Crawl Control blocks

    Cloudflare AI adds custom WAF rule modifications to AI Crawl Control while preserving manual edits and surfacing parse warnings.

    AI Crawl Control now supports extending the underlying WAF rule with custom modifications. Any changes you make directly in the WAF custom rules editor such as adding path-based exceptions, extra user agents, or additional expression clauses are preserved when you update crawler actions in AI Crawl Control.

    If the WAF rule expression has been modified in a way AI Crawl Control cannot parse, a warning banner appears on the Crawlers page with a link to view the rule directly in WAF.

    For more information, refer to WAF rule management.

    Original source Report a problem
  • Mar 23, 2026
    • Date parsed from source:
      Mar 23, 2026
    • First seen by Releasebot:
      Mar 23, 2026
    • Modified by Releasebot:
      Mar 24, 2026
    Cloudflare logo

    Cloudflare AI by Cloudflare

    Agents SDK v0.8.0: readable state, idempotent schedules, typed AgentClient, and Zod 4

    Cloudflare AI adds readable agent state, idempotent scheduling, and full TypeScript inference for AgentClient, while moving to Zod 4. It also fixes chat streaming and duplicate-message issues, removes experimental tags from keepAlive tools, and adds TanStack AI support in codemode.

    The latest release of the Agents SDK exposes agent state as a readable property, prevents duplicate schedule rows across Durable Object restarts, brings full TypeScript inference to AgentClient, and migrates to Zod 4.

    Readable state on useAgent and AgentClient

    Both useAgent (React) and AgentClient (vanilla JS) now expose a state property that reflects the current agent state. Previously, reading state required manually tracking it through the onStateUpdate callback.

    Idempotent schedule()

    schedule() now supports an idempotent option that deduplicates by (type, callback, payload), preventing duplicate rows from accumulating when called in places that run on every Durable Object restart such as onStart().

    Cron schedules are idempotent by default. Calling schedule("0 * * * *", "tick") multiple times with the same callback, expression, and payload returns the existing schedule row instead of creating a new one. Pass { idempotent: false } to override.

    Delayed and date-scheduled types support opt-in idempotency.

    Typed AgentClient with call inference and stub proxy

    AgentClient now accepts an optional agent type parameter for full type inference on RPC calls, matching the typed experience already available with useAgent.

    @cloudflare/ai-chat fixes

    • Turn serialization — onChatMessage() and _reply() work is now queued so user requests, tool continuations, and saveMessages() never stream concurrently.
    • Duplicate messages on stop — Clicking stop during an active stream no longer splits the assistant message into two entries.
    • Duplicate messages after tool calls — Orphaned client IDs no longer leak into persistent storage.

    keepAlive() and keepAliveWhile() are no longer experimental

    keepAlive() now uses a lightweight in-memory ref count instead of schedule rows. Multiple concurrent callers share a single alarm cycle. The @experimental tag has been removed from both keepAlive() and keepAliveWhile().

    @cloudflare/codemode: TanStack AI integration

    A new entry point @cloudflare/codemode/tanstack-ai adds support for TanStack AI's chat() as an alternative to the Vercel AI SDK's streamText().

    Upgrade

    To update to the latest version:

    npm i agents@latest @cloudflare/ai-chat@latest
    
    Original source Report a problem
  • Mar 23, 2026
    • Date parsed from source:
      Mar 23, 2026
    • First seen by Releasebot:
      Mar 23, 2026
    • Modified by Releasebot:
      Mar 24, 2026
    Cloudflare logo

    Cloudflare AI by Cloudflare

    New AI Search REST API endpoints for /search and /chat/completions

    Cloudflare AI adds new AI Search REST API endpoints for search and chat in an OpenAI-compatible format, making it easier to use existing OpenAI SDKs and maintain multi-turn context. It also recommends migrating from AutoRAG endpoints, which remain fully supported.

    AI Search now offers new REST API endpoints for search and chat that use an OpenAI compatible format. This means you can use the familiar messages array structure that works with existing OpenAI SDKs and tools. The messages array also lets you pass previous messages within a session, so the model can maintain context across multiple turns.

    Endpoints

    • Chat Completions: POST /accounts/{account_id}/ai-search/instances/{name}/chat/completions
    • Search: POST /accounts/{account_id}/ai-search/instances/{name}/search

    Example request to the Chat Completions endpoint using the new messages array format is provided.

    For more details, refer to the AI Search REST API guide.

    Migration from existing AutoRAG API (recommended)

    If you are using the previous AutoRAG API endpoints (/autorag/rags/), we recommend migrating to the new endpoints. The previous AutoRAG API endpoints will continue to be fully supported.

    Refer to the migration guide for step-by-step instructions.

    Original source Report a problem
  • Mar 23, 2026
    • Date parsed from source:
      Mar 23, 2026
    • First seen by Releasebot:
      Mar 23, 2026
    • Modified by Releasebot:
      Mar 24, 2026
    Cloudflare logo

    Cloudflare AI by Cloudflare

    AI Search UI snippets and MCP support

    Cloudflare AI adds public endpoints, UI snippets, and MCP to AI Search, making it easier to embed search on websites and let AI agents query content through the Model Context Protocol.

    AI Search now supports public endpoints, UI snippets, and MCP, making it easy to add search to your website or connect AI agents.

    Public endpoints

    Public endpoints allow you to expose AI Search capabilities without requiring API authentication. To enable public endpoints:

    1. Go to AI Search in the Cloudflare dashboard.
    2. Select your instance, and turn on Public Endpoint in Settings. For more details, refer to Public endpoint configuration.

    UI snippets

    UI snippets are pre-built search and chat components you can embed in your website. Visit search.ai.cloudflare.com to configure and preview components for your AI Search instance.

    MCP

    The MCP endpoint allows AI agents to search your content via the Model Context Protocol. Connect your MCP client to: https://.search.ai.cloudflare.com/mcp

    For more details, refer to the MCP documentation.

    Original source Report a problem
  • Mar 23, 2026
    • Date parsed from source:
      Mar 23, 2026
    • First seen by Releasebot:
      Mar 23, 2026
    Cloudflare logo

    Cloudflare AI by Cloudflare

    Custom metadata filtering for AI Search

    Cloudflare AI adds custom metadata filtering to AI Search, letting users define up to five custom fields and filter results by attributes like category, version, or other metadata in search queries.

    AI Search now supports custom metadata filtering, allowing you to define your own metadata fields and filter search results based on attributes like category, version, or any custom field you define.

    Define a custom metadata schema

    You can define up to 5 custom metadata fields per AI Search instance. Each field has a name and data type (text, number, or boolean).

    Add metadata to your documents

    How you attach metadata depends on your data source:

    • R2 bucket: Set metadata using S3-compatible custom headers (x-amz-meta-*) when uploading objects. Refer to R2 custom metadata for examples.
    • Website: Add tags to your HTML pages. Refer to Website custom metadata for details.

    Filter search results

    Use custom metadata fields in your search queries alongside built-in attributes like folder and timestamp.

    Learn more in the metadata filtering documentation.

    Original source Report a problem
  • Mar 19, 2026
    • Date parsed from source:
      Mar 19, 2026
    • First seen by Releasebot:
      Mar 20, 2026
    • Modified by Releasebot:
      Mar 23, 2026
    Cloudflare logo

    Cloudflare AI by Cloudflare

    Moonshot AI Kimi K2.5 now available on Workers AI

    Cloudflare AI adds Kimi K2.5 to Workers AI, bringing a frontier-scale open-source model with 256k context, multi-turn tool calling, vision inputs, structured outputs, and function calling. It also expands prefix caching with cached-token metrics and pricing discounts for faster, more efficient inference.

    Workers AI is officially in the big models game.

    @cf/moonshotai/kimi-k2.5 is the first frontier-scale open-source model on our AI inference platform — a large model with a full 256k context window, multi-turn tool calling, vision inputs, and structured outputs. By bringing a frontier-scale model directly onto the Cloudflare Developer Platform, you can now run the entire agent lifecycle on a single, unified platform.

    The model has proven to be a fast, efficient alternative to larger proprietary models without sacrificing quality. As AI adoption increases, the volume of inference is skyrocketing — now you can access frontier intelligence at a fraction of the cost.

    Key capabilities

    • 256,000 token context window for retaining full conversation history, tool definitions, and entire codebases across long-running agent sessions
    • Multi-turn tool calling for building agents that invoke tools across multiple conversation turns
    • Vision inputs for processing images alongside text
    • Structured outputs with JSON mode and JSON Schema support for reliable downstream parsing
    • Function calling for integrating external tools and APIs into agent workflows

    Prefix caching and session affinity

    When an agent sends a new prompt, it resends all previous prompts, tools, and context from the session. The delta between consecutive requests is usually just a few new lines of input. Prefix caching avoids reprocessing the shared context, saving time and compute from the prefill stage. This means faster Time to First Token (TTFT) and higher Tokens Per Second (TPS) throughput.

    Workers AI has done prefix caching, but we are now surfacing cached tokens as a usage metric and offering a discount on cached tokens compared to input tokens (pricing is listed on the model page).

    Use Nemotron 3 Super through the Workers AI binding (env.AI.run()), the REST API at /run or /v1/chat/completions, or via the OpenAI-compatible endpoint.

    For more information, refer to the Kimi K2.5 model page, pricing, and prompt caching.

    Original source Report a problem
  • Mar 17, 2026
    • Date parsed from source:
      Mar 17, 2026
    • First seen by Releasebot:
      Mar 18, 2026
    • Modified by Releasebot:
      Mar 23, 2026
    Cloudflare logo

    Cloudflare AI by Cloudflare

    @cloudflare/codemode v0.2.1: MCP barrel export, zero-dependency main entry point, and custom sandbox modules

    Cloudflare AI’s @cloudflare/codemode adds a new MCP export, trims main entry point peer dependencies, and gives more sandbox control. It also moves AI types to a separate path, adds custom sandbox modules, and simplifies code normalization and tool-name sanitization.

    The latest releases of @cloudflare/codemode add a new MCP barrel export, remove ai and zod as required peer dependencies from the main entry point, and give you more control over the sandbox.

    New @cloudflare/codemode/mcp export

    A new @cloudflare/codemode/mcp entry point provides two functions that wrap MCP servers with Code Mode:

    • codeMcpServer({ server, executor }) — wraps an existing MCP server with a single code tool where each upstream tool becomes a typed codemode.* method.
    • openApiMcpServer({ spec, executor, request }) — creates search and execute MCP tools from an OpenAPI spec with host-side request proxying and automatic $ref resolution.

    Zero-dependency main entry point

    Breaking change in v0.2.0: generateTypes and the ToolDescriptor / ToolDescriptors types have moved to @cloudflare/codemode/ai.

    The main entry point (@cloudflare/codemode) no longer requires the ai or zod peer dependencies. It now exports:

    • sanitizeToolName: Sanitize tool names into valid JS identifiers
    • normalizeCode: Normalize LLM-generated code into async arrow functions
    • generateTypesFromJsonSchema: Generate TypeScript type definitions from plain JSON Schema
    • jsonSchemaToType: Convert a single JSON Schema to a TypeScript type string
    • DynamicWorkerExecutor: Sandboxed code execution via Dynamic Worker Loader
    • ToolDispatcher: RPC target for dispatching tool calls from sandbox to host

    Custom sandbox modules

    DynamicWorkerExecutor now accepts an optional modules option to inject custom ES modules into the sandbox.

    Internal normalization and sanitization

    DynamicWorkerExecutor now normalizes code and sanitizes tool names internally. You no longer need to call normalizeCode() or sanitizeToolName() before passing code and functions to execute().

    Upgrade

    To update to the latest version:

    npm i @cloudflare/codemode@latest
    

    See the Code Mode documentation for the full API reference.

    Original source Report a problem

Related products