Cloudflare AI Release Notes

Last updated: Feb 17, 2026

  • Feb 17, 2026
    • Date parsed from source:
      Feb 17, 2026
    • First seen by Releasebot:
      Feb 17, 2026

    Cloudflare AI by Cloudflare

    Agents SDK v0.5.0: Protocol message control, retry utilities, data parts, and @cloudflare/ai-chat v0.1.0

    Agents SDK adds built in retry utilities with exponential backoff and per task options, plus per connection protocol message control for WebSocket flows. The @cloudflare/ai-chat v0.1.0 brings data parts, tool approval persistence, and smarter persistence and messaging.

    The latest release

    The latest release of the Agents SDK adds built-in retry utilities, per-connection protocol message control, and a fully rewritten @cloudflare/ai-chat with data parts, tool approval persistence, and zero breaking changes.

    Retry utilities

    A new this.retry() method lets you retry any async operation with exponential backoff and jitter. You can pass an optional shouldRetry predicate to bail early on non-retryable errors.

    Retry options are also available per-task on queue(), schedule(), scheduleEvery(), and addMcpServer():

    Retry options are validated eagerly at enqueue/schedule time, and invalid values throw immediately. Internal retries have also been added for workflow operations (terminateWorkflow, pauseWorkflow, and others) with Durable Object-aware error detection.

    Per-connection protocol message control

    Agents automatically send JSON text frames (identity, state, MCP server lists) to every WebSocket connection. You can now suppress these per-connection for clients that cannot handle them — binary-only devices, MQTT clients, or lightweight embedded systems.

    Connections with protocol messages disabled still fully participate in RPC and regular messaging. Use isConnectionProtocolEnabled(connection) to check a connection's status at any time. The flag persists across Durable Object hibernation.

    See Protocol messages for full documentation.

    @cloudflare/ai-chat v0.1.0

    The first stable release of @cloudflare/ai-chat ships alongside this release with a major refactor of AIChatAgent internals — new ResumableStream class, WebSocket ChatTransport, and simplified SSE parsing — with zero breaking changes. Existing code using AIChatAgent and useAgentChat works as-is.

    Key new features:

    • Data parts — Attach typed JSON blobs (data-*) to messages alongside text. Supports reconciliation (type+id updates in-place), append, and transient parts (ephemeral via onData callback). See Data parts.
    • Tool approval persistence — The needsApproval approval UI now survives page refresh and DO hibernation. The streaming message is persisted to SQLite when a tool enters approval-requested state.
    • maxPersistedMessages — Cap SQLite message storage with automatic oldest-message deletion.
    • body option on useAgentChat — Send custom data with every request (static or dynamic).
    • Incremental persistence — Hash-based cache to skip redundant SQL writes.
    • Row size guard — Automatic two-pass compaction when messages approach the SQLite 2 MB limit.
    • autoContinueAfterToolResult defaults to true — Client-side tool results and tool approvals now automatically trigger a server continuation, matching server-executed tool behavior. Set autoContinueAfterToolResult: false in useAgentChat to restore the previous behavior.

    Notable bug fixes:

    • Resolved stream resumption race conditions
    • Resolved an issue where setMessages functional updater sent empty arrays
    • Resolved an issue where client tool schemas were lost after DO hibernation
    • Resolved InvalidPromptError after tool approval (approval.id was dropped)
    • Resolved an issue where message metadata was not propagated on broadcast/resume paths
    • Resolved an issue where clearAll() did not clear in-memory chunk buffers
    • Resolved an issue where reasoning-delta silently dropped data when reasoning-start was missed during stream resumption

    Synchronous queue and schedule getters

    getQueue(), getQueues(), getSchedule(), dequeue(), dequeueAll(), and dequeueAllByCallback() were unnecessarily async despite only performing synchronous SQL operations. They now return values directly instead of wrapping them in Promises. This is backward compatible — existing code using await on these methods will continue to work.

    Other improvements

    • Fix TypeScript "excessively deep" error — A depth counter on CanSerialize and IsSerializableParam types bails out to true after 10 levels of recursion, preventing the "Type instantiation is excessively deep" error with deeply nested types like AI SDK CoreMessage[].
    • POST SSE keepalive — The POST SSE handler now sends event: ping every 30 seconds to keep the connection alive, matching the existing GET SSE handler behavior. This prevents POST response streams from being silently dropped by proxies during long-running tool calls.
    • Widened peer dependency ranges — Peer dependency ranges across packages have been widened to prevent cascading major bumps during 0.x minor releases. @cloudflare/ai-chat and @cloudflare/codemode are now marked as optional peer dependencies.

    Upgrade

    To update to the latest version:

    npm i agents@latest @cloudflare/ai-chat@latest

    Original source Report a problem
  • Feb 13, 2026
    • Date parsed from source:
      Feb 13, 2026
    • First seen by Releasebot:
      Feb 17, 2026

    Cloudflare AI by Cloudflare

    Introducing GLM-4.7-Flash on Workers AI, @cloudflare/tanstack-ai, and workers-ai-provider v3.1.1

    Cloudflare rolls out GLM-4.7-Flash on Workers AI for edge multilingual agents with a huge 131k token window. It couples with @cloudflare/tanstack-ai and workers-ai-provider v3.1.1 bringing tool calling, transcription, TTS, and streaming reliability upgrades.

    GLM-4.7-Flash — Multilingual Text Generation Model

    We're excited to announce GLM-4.7-Flash on Workers AI, a fast and efficient text generation model optimized for multilingual dialogue and instruction-following tasks, along with the brand-new @cloudflare/tanstack-ai package and workers-ai-provider v3.1.1.

    You can now run AI agents entirely on Cloudflare. With GLM-4.7-Flash's multi-turn tool calling support, plus full compatibility with TanStack AI and the Vercel AI SDK, you have everything you need to build agentic applications that run completely at the edge.

    GLM-4.7-Flash — Multilingual Text Generation Model

    @cf/zai-org/glm-4.7-flash is a multilingual model with a 131,072 token context window, making it ideal for long-form content generation, complex reasoning tasks, and multilingual applications.

    Key Features and Use Cases:

    • Multi-turn Tool Calling for Agents: Build AI agents that can call functions and tools across multiple conversation turns
    • Multilingual Support: Built to handle content generation in multiple languages effectively
    • Large Context Window: 131,072 tokens for long-form writing, complex reasoning, and processing long documents
    • Fast Inference: Optimized for low-latency responses in chatbots and virtual assistants
    • Instruction Following: Excellent at following complex instructions for code generation and structured tasks

    Use GLM-4.7-Flash through the Workers AI binding (env.AI.run()), the REST API at /run or /v1/chat/completions, AI Gateway, or via workers-ai-provider for the Vercel AI SDK.

    Pricing is available on the model page or pricing page.

    @cloudflare/tanstack-ai v0.1.1 — TanStack AI adapters for Workers AI and AI Gateway

    We've released @cloudflare/tanstack-ai, a new package that brings Workers AI and AI Gateway support to TanStack AI. This provides a framework-agnostic alternative for developers who prefer TanStack's approach to building AI applications.

    Workers AI adapters support four configuration modes — plain binding (env.AI), plain REST, AI Gateway binding (env.AI.gateway(id)), and AI Gateway REST — across all capabilities:

    • Chat (createWorkersAiChat) — Streaming chat completions with tool calling, structured output, and reasoning text streaming.
    • Image generation (createWorkersAiImage) — Text-to-image models.
    • Transcription (createWorkersAiTranscription) — Speech-to-text.
    • Text-to-speech (createWorkersAiTts) — Audio generation.
    • Summarization (createWorkersAiSummarize) — Text summarization.

    AI Gateway adapters route requests from third-party providers — OpenAI, Anthropic, Gemini, Grok, and OpenRouter — through Cloudflare AI Gateway for caching, rate limiting, and unified billing.

    workers-ai-provider v3.1.1 — transcription, speech, reranking, and reliability

    The Workers AI provider for the Vercel AI SDK now supports three new capabilities beyond chat and image generation:

    • Transcription (provider.transcription(model)) — Speech-to-text with automatic handling of model-specific input formats across binding and REST paths.
    • Text-to-speech (provider.speech(model)) — Audio generation with support for voice and speed options.
    • Reranking (provider.reranking(model)) — Document reranking for RAG pipelines and search result ordering.

    This release also includes a comprehensive reliability overhaul (v3.0.5):

    • Fixed streaming — Responses now stream token-by-token instead of buffering all chunks, using a proper TransformStream pipeline with backpressure.
    • Fixed tool calling — Resolved issues with tool call ID sanitization, conversation history preservation, and a heuristic that silently fell back to non-streaming mode when tools were defined.
    • Premature stream termination detection — Streams that end unexpectedly now report finishReason: "error" instead of silently reporting "stop".
    • AI Search support — Added createAISearch as the canonical export (renamed from AutoRAG). createAutoRAG still works with a deprecation warning.

    To upgrade:

    npm install workers-ai-provider@latest ai
    

    Resources

    • @cloudflare/tanstack-ai on npm
    • workers-ai-provider on npm
    • GitHub repository
    Original source Report a problem
  • All of your release notes in one feed

    Join Releasebot and get updates from Cloudflare and hundreds of other software products.

  • Feb 9, 2026
    • Date parsed from source:
      Feb 9, 2026
    • First seen by Releasebot:
      Feb 12, 2026

    Cloudflare AI by Cloudflare

    Analytics enhancements

    AI Crawl Control adds new views and a Patterns tab to group crawled URIs, plus enhanced referral analytics and time series charts. Bandwidth metrics surface as Bytes over time and per-crawler transfers, with image exports for reports. Clear release-ready enhancements for analytics.

    Path pattern grouping

    In the Metrics tab > Most popular paths table, use the new Patterns tab that groups requests by URI pattern (/blog/, /api/v1/, /docs/*) to identify which site areas crawlers target most. Refer to the screenshot above.

    Enhanced referral analytics

    Destination patterns show which site areas receive AI-driven referral traffic.
    In the Metrics tab, a new Referrals over time chart shows trends by operator or source.

    Data transfer metrics

    In the Metrics tab > Allowed requests over time chart, toggle Bytes to show bandwidth consumption.
    In the Crawlers tab, a new Bytes Transferred column shows bandwidth per crawler.

    Image exports

    Export charts and tables as images for reports and presentations.

    Learn more about analyzing AI traffic.

    Original source Report a problem
  • Feb 9, 2026
    • Date parsed from source:
      Feb 9, 2026
    • First seen by Releasebot:
      Feb 10, 2026

    Cloudflare AI by Cloudflare

    Agents SDK v0.4.0: Readonly connections, MCP security improvements, x402 v2 migration, and custom MCP OAuth providers

    The Agents SDK release adds readonly WebSocket connections with new hooks, customizable MCP OAuth providers, and a secure x402 v2 migration plus MCP SDK 1.26.0. It also enforces per-request MCPServer instances and tighter OAuth callback security for safer, more flexible integrations.

    The latest release

    The latest release of the Agents SDK brings readonly connections, MCP protocol and security improvements, x402 payment protocol v2 migration, and the ability to customize OAuth for MCP server connections.

    Readonly connections

    Agents can now restrict WebSocket clients to read-only access, preventing them from modifying agent state. This is useful for dashboards, spectator views, or any scenario where clients should observe but not mutate.

    New hooks: shouldConnectionBeReadonly, setConnectionReadonly, isConnectionReadonly. Readonly connections block both client-side setState() and mutating @callable() methods, and the readonly flag survives hibernation.

    Custom MCP OAuth providers

    The new createMcpOAuthProvider method on the Agent class allows subclasses to override the default OAuth provider used when connecting to MCP servers. This enables custom authentication strategies such as pre-registered client credentials or mTLS, beyond the built-in dynamic client registration.

    MCP SDK upgrade to 1.26.0

    Upgraded the MCP SDK to 1.26.0 to prevent cross-client response leakage. Stateless MCP Servers should now create a new McpServer instance per request instead of sharing a single instance. A guard is added in this version of the MCP SDK which will prevent connection to a Server instance that has already been connected to a transport. Developers will need to modify their code if they declare their McpServer instance as a global variable.

    MCP OAuth callback URL security fix

    Added callbackPath option to addMcpServer to prevent instance name leakage in MCP OAuth callback URLs. When sendIdentityOnConnect is false, callbackPath is now required — the default callback URL would expose the instance name, undermining the security intent. Also fixes callback request detection to match via the state parameter instead of a loose /callback URL substring check, enabling custom callback paths.

    Deprecate onStateUpdate in favor of onStateChanged

    onStateChanged is a drop-in rename of onStateUpdate (same signature, same behavior). onStateUpdate still works but emits a one-time console warning per class. validateStateChange rejections now propagate a CF_AGENT_STATE_ERROR message back to the client.

    x402 v2 migration

    Migrated the x402 MCP payment integration from the legacy x402 package to @x402/core and @x402/evm v2.

    Breaking changes for x402 users:

    • Peer dependencies changed: replace x402 with @x402/core and @x402/evm
    • PaymentRequirements type now uses v2 fields (e.g. amount instead of maxAmountRequired)
    • X402ClientConfig.account type changed from viem.Account to ClientEvmSigner (structurally compatible with privateKeyToAccount())

    Other x402 changes:

    • X402ClientConfig.network is now optional — the client auto-selects from available payment requirements
    • Server-side lazy initialization: facilitator connection is deferred until the first paid tool invocation
    • Payment tokens support both v2 (PAYMENT-SIGNATURE) and v1 (X-PAYMENT) HTTP headers
    • Added normalizeNetwork export for converting legacy network names to CAIP-2 format
    • Re-exports PaymentRequirements, PaymentRequired, Network, FacilitatorConfig, and ClientEvmSigner from agents/x402

    Other improvements

    • Fix useAgent and AgentClient crashing when using basePath routing
    • CORS handling delegated to partyserver's native support (simpler, more reliable)
    • Client-side onStateUpdateError callback for handling rejected state updates

    Upgrade

    To update to the latest version:

    npm i agents@latest
    
    Original source Report a problem
  • Feb 9, 2026
    • Date parsed from source:
      Feb 9, 2026
    • First seen by Releasebot:
      Feb 10, 2026

    Cloudflare AI by Cloudflare

    Interactive browser terminals in Sandboxes

    Sandbox SDK adds PTY passthrough letting browser terminal UIs connect to sandbox shells over WebSocket. Each session gets an isolated terminal, supporting multiple terminals per sandbox and separate working dirs. New xterm.js addon enables automatic reconnect, buffered replay, and resize support.

    PTY passthrough

    The Sandbox SDK now supports PTY (pseudo-terminal) passthrough, enabling browser-based terminal UIs to connect to sandbox shells via WebSocket.

    sandbox.terminal(request)
    

    The new terminal() method proxies a WebSocket upgrade to the container's PTY endpoint, with output buffering for replay on reconnect.

    Multiple terminals per sandbox

    Each session can have its own terminal with an isolated working directory and environment, so users can run separate shells side-by-side in the same container.

    xterm.js addon

    The new @cloudflare/sandbox/xterm export provides a SandboxAddon for xterm.js with automatic reconnection (exponential backoff + jitter), buffered output replay, and resize forwarding.

    Upgrade

    To update to the latest version:

    npm i @cloudflare/sandbox@latest
    
    Original source Report a problem
  • Feb 9, 2026
    • Date parsed from source:
      Feb 9, 2026
    • First seen by Releasebot:
      Feb 10, 2026

    Cloudflare AI by Cloudflare

    AI Search now with more granular controls over indexing

    AI Search gains targeted reindexing and selective crawling. Reindex individual files without a full sync and crawl only chosen sitemaps by URL, speeding updates and reducing scans.

    Get your content updates into AI Search faster and avoid a full rescan when you do not need it.

    Reindex individual files without a full sync

    Updated a file or need to retry one that errored? When you know exactly which file changed, you can now reindex it directly instead of rescanning your entire data source.

    Go to Overview > Indexed Items and select the sync icon next to any file to reindex it immediately.

    Crawl only the sitemap you need

    By default, AI Search crawls all sitemaps listed in your robots.txt, up to the maximum files per index limit. If your site has multiple sitemaps but you only want to index a specific set, you can now specify a single sitemap URL to limit what the crawler visits.

    For example, if your robots.txt lists both blog-sitemap.xml and docs-sitemap.xml, you can specify just https://example.com/docs-sitemap.xml to index only your documentation.

    Configure your selection anytime in Settings > Parsing options > Specific sitemaps, then trigger a sync to apply the changes.

    Learn more about indexing controls and website crawling configuration.

    Original source Report a problem
  • Feb 3, 2026
    • Date parsed from source:
      Feb 3, 2026
    • First seen by Releasebot:
      Feb 3, 2026
    • Modified by Releasebot:
      Feb 10, 2026

    Cloudflare AI by Cloudflare

    Agents SDK v0.3.7: Workflows integration, synchronous state, and scheduleEvery()

    New Agents SDK release adds first class Cloudflare Workflows integration, enabling Agents to manage WebSocket state alongside durable workflows. It introduces AgentWorkflow, synchronous state validation, and fixed-interval scheduling with overlap prevention, plus enhanced RPC, streaming, and secure email routing.

    Cloudflare Workflows integration

    Agents excel at real-time communication and state management. Workflows excel at durable execution. Together, they enable powerful patterns where Agents handle WebSocket connections while Workflows handle long-running tasks, retries, and human-in-the-loop flows.

    Use the new AgentWorkflow class to define workflows with typed access to your Agent:

    Start workflows from your Agent with runWorkflow() and handle lifecycle events:

    Key workflow methods on your Agent:

    • runWorkflow(workflowName, params, options?) — Start a workflow with optional metadata
    • getWorkflow(workflowId) / getWorkflows(criteria?) — Query workflows with cursor-based pagination
    • approveWorkflow(workflowId) / rejectWorkflow(workflowId) — Human-in-the-loop approval flows
    • pauseWorkflow(), resumeWorkflow(), terminateWorkflow() — Workflow control

    Synchronous setState()

    State updates are now synchronous with a new validateStateChange() validation hook:

    scheduleEvery() for recurring tasks

    The new scheduleEvery() method enables fixed-interval recurring tasks with built-in overlap prevention:

    Callable system improvements

    • Client-side RPC timeout — Set timeouts on callable method invocations
    • StreamingResponse.error(message) — Graceful stream error signaling
    • getCallableMethods() — Introspection API for discovering callable methods
    • Connection close handling — Pending calls are automatically rejected on disconnect

    Email and routing enhancements

    • Secure email reply routing — Email replies are now secured with HMAC-SHA256 signed headers, preventing unauthorized routing of emails to agent instances.
    • Routing improvements:
      • basePath option to bypass default URL construction for custom routing
      • Server-sent identity — Agents send name and agent type on connect
      • New onIdentity and onIdentityChange callbacks on the client

    Upgrade

    To update to the latest version:
    npm i agents@latest

    For the complete Workflows API reference and patterns, see Run Workflows.

    Original source Report a problem
  • Jan 28, 2026
    • Date parsed from source:
      Jan 28, 2026
    • First seen by Releasebot:
      Jan 29, 2026
    • Modified by Releasebot:
      Feb 3, 2026

    Cloudflare AI by Cloudflare

    Launching FLUX.2 [klein] 9B on Workers AI

    Workers AI partners with Black Forest Labs to bring FLUX.2 Klein 9B to the platform, delivering faster, higher quality inference for rapid prototyping and real‑time apps. The distilled model runs in a fixed 4‑step process and supports multipart form data via REST API and binding.

    Workers AI platform specifics

    We have partnered with Black Forest Labs (BFL) again to bring their optimized FLUX.2 [klein] 9B model to Workers AI. This distilled model offers enhanced quality compared to the 4B variant, while maintaining cost-effective pricing. With a fixed 4-step inference process, Klein 9B is ideal for rapid prototyping and real-time applications where both speed and quality matter.

    Read the BFL blog to learn more about the model itself, or try it out yourself on our multi modal playground.

    Pricing documentation is available on the model page or pricing page.

    The model hosted on Workers AI is optimized for speed with a fixed 4-step inference process and supports up to 4 image inputs. Since this is a distilled model, the steps parameter is fixed at 4 and cannot be adjusted. Like FLUX.2 [dev] and FLUX.2 [klein] 4B, this image model uses multipart form data inputs, even if you just have a prompt.

    With the REST API, the multipart form data input looks like this:

    curl --request POST --url 'https://api.cloudflare.com/client/v4/accounts/{ACCOUNT}/ai/run/@cf/black-forest-labs/flux-2-klein-9b' --header 'Authorization: Bearer {TOKEN}' --header 'Content-Type: multipart/form-data' --form 'prompt=a sunset at the alps' --form width=1024 --form height=1024
    

    With the Workers AI binding, you can use it as such:

    const form = new FormData();
    form.append("prompt", "a sunset with a dog");
    form.append("width", "1024");
    form.append("height", "1024");
    
    const resp = await env.AI.run("@cf/black-forest-labs/flux-2-klein-9b", { multipart: { body: form, contentType: "multipart/form-data" } });
    
    Original source Report a problem
  • Jan 28, 2026
    • Date parsed from source:
      Jan 28, 2026
    • First seen by Releasebot:
      Jan 16, 2026
    • Modified by Releasebot:
      Feb 10, 2026

    Cloudflare AI by Cloudflare

    Launching FLUX.2 [klein] 9B on Workers AI

    Workers AI teams with Black Forest Labs to bring the FLUX.2 klein-9B model. This distilled 9B image model runs a fixed 4-step inference for fast prototyping and supports up to 4 inputs in the multi modal playground.

    We have partnered with Black Forest Labs (BFL) again to bring their optimized FLUX.2 [klein] 9B model to Workers AI. This distilled model offers enhanced quality compared to the 4B variant, while maintaining cost-effective pricing. With a fixed 4-step inference process, Klein 9B is ideal for rapid prototyping and real-time applications where both speed and quality matter.

    Read the BFL blog to learn more about the model itself, or try it out yourself on our multi modal playground.

    Pricing documentation is available on the model page or pricing page.

    Workers AI platform specifics

    The model hosted on Workers AI is optimized for speed with a fixed 4-step inference process and supports up to 4 image inputs. Since this is a distilled model, the steps parameter is fixed at 4 and cannot be adjusted. Like FLUX.2 [dev] and FLUX.2 [klein] 4B, this image model uses multipart form data inputs, even if you just have a prompt.

    With the REST API, the multipart form data input looks like this:

    With the Workers AI binding, you can use it as such:

    Multi-reference images

    The FLUX.2 klein-9b model supports generating images based on reference images, just like FLUX.2 [dev] and FLUX.2 [klein] 4B. You can use this feature to apply the style of one image to another, add a new character to an image, or iterate on past generated images. You would use it with the same multipart form data structure, with the input images in binary. The model supports up to 4 input images.

    For the prompt, you can reference the images based on the index, like take the subject of image 1 and style it like image 0 or even use natural language like place the dog beside the woman.

    You must name the input parameter as input_image_0, input_image_1, input_image_2, input_image_3 for it to work correctly. All input images must be smaller than 512x512.

    Through Workers AI Binding:

    The parameters you can send to the model are detailed here:

    Original source Report a problem
  • Jan 23, 2026
    • Date parsed from source:
      Jan 23, 2026
    • First seen by Releasebot:
      Jan 23, 2026

    Cloudflare AI by Cloudflare

    Vectorize indexes now support up to 10 million vectors

    You can now store up to 10 million vectors in a single Vectorize index, doubling the previous limit of 5 million vectors. This enables larger-scale semantic search, recommendation systems, and retrieval-augmented generation (RAG) applications without splitting data across multiple indexes.

    Vectorize continues to support indexes with up to 1,536 dimensions per vector at 32-bit precision. Refer to the Vectorize limits documentation for complete details.

    Original source Report a problem

Related products