OpenRouter Release Notes
110 release notes curated from 86 sources by the Releasebot Team. Last updated: Jun 11, 2026
- Jun 10, 2026
- Date parsed from source:Jun 10, 2026
- First seen by Releasebot:Jun 11, 2026
Advisor: Give Any Model a Lifeline to a Smarter One
OpenRouter adds the openrouter:advisor tool, letting a model call a stronger model mid-generation for help on hard decisions, sanity checks, and complex reasoning. It supports any executor and advisor pairing, named specialists, advisor tools, streaming advice, and cross-request memory.
Add openrouter:advisor to your tools array and your model can ask a stronger model for help mid-generation.
When the executor hits a hard decision, gets stuck, or wants a sanity check before finishing, it calls the advisor with a prompt. The advisor thinks, returns guidance as the tool result, and the executor keeps going with better information.
Both roles are open: any model on OpenRouter can be the executor, and any model from any provider can be the advisor. Run a Gemini executor that consults Claude, or a GPT executor that consults DeepSeek. You pick the pairing.
Try it in the chatroom or read the docs for the full API reference.
{ "model" : "openai/gpt-4o-mini", "messages" : [{ "role" : "user", "content" : "Design a rate limiter for a distributed API gateway." }], "tools" : [ { "type" : "openrouter:advisor", "parameters" : { "model" : "anthropic/claude-fable-5" } } ] }67x price gap, selective consultation
Claude Fable 5 costs $10 per million input tokens. GPT-4o Mini costs $0.15 per million. That’s a 67x spread.
Most requests don’t need frontier-level reasoning. A mid-tier model handles the bulk of a workload without issue. But the 10-20% that involves architectural decisions, ambiguous edge cases, or multi-step reasoning chains is where cheaper models stumble.
The advisor tool covers that gap selectively. Your fast model runs the show. When it hits something genuinely hard, it calls for help. You pay frontier prices only for the moments that need frontier thinking.
In an agentic coding session with 50 tool calls, maybe 2-3 are advisor consultations. The rest run at mini prices. You’ve sanded down your per-session cost while keeping the quality ceiling high.
Server-side execution, one tool call
The advisor runs server-side during generation. Your model calls it like any other tool: pass a prompt describing what it needs help with, get back the advisor’s text as the tool result. The model then writes the final answer itself, informed by the advice. The advisor is a consultant, not a ghostwriter.
Four things worth knowing:
- Any model, from any provider, can be the advisor. Pin it in the tool config with parameters.model (anything in the model catalog works), or let the executor pick per-call. Use ~anthropic/claude-fable-latest to always resolve to the newest Fable.
- The advisor gets its own tools. Give it openrouter:web_search and it’ll ground its advice in fresh sources before responding. It runs as a sub-agent with its own tool loop, then returns just the final guidance.
- Recursion is blocked. The advisor can’t call itself. A depth header and self-reference check prevent unbounded nesting, and consultations are capped per request to bound cost.
- The advisor remembers. Replay the conversation transcript in a follow-up request (with the advisor tool calls and results included) and each advisor reconstructs its prior consultations, so a follow-up question builds on what the advisor already said. Memory is per advisor (your security reviewer and your architect each keep their own thread) and works across Chat Completions, Responses, and Anthropic Messages. Full details.
Named advisors
For complex workflows, you can configure a roster of specialists. Add one openrouter:advisor entry per advisor, each with its own name, model, instructions, and tool set:
{ "tools" : [ { "type" : "openrouter:advisor", "parameters" : { "name" : "security-reviewer", "model" : "anthropic/claude-fable-5", "instructions" : "You are a security engineer. Find vulnerabilities." } }, { "type" : "openrouter:advisor", "parameters" : { "name" : "architect", "model" : "openai/gpt-5.5", "instructions" : "You are a systems architect. Prioritize simplicity and scalability." } } ] }The executor sees a distinct tool for each advisor and calls whichever fits the task with just a prompt. An auth flow review routes to Claude Fable with the security persona; architecture questions go to GPT-5.5. Names can use letters, digits, spaces, underscores, and dashes (“Lead Architect” works), and must be unique across entries. One entry can omit name to act as the default advisor.
Advice can also stream. Set "stream": true on an advisor entry and you get the advice incrementally as the advisor writes it. In the Responses API that means response.output_text.delta events while the advice is in flight; the completed output item still carries the full text, so consumers that ignore deltas see no difference. (Chat Completions ignores the flag, and Messages-API streaming is a fast-follow.)
How this compares to other advisor tools
Some providers ship a similar advisor concept in their own APIs, but it stays inside their model family: the executor and the advisor both have to come from the same vendor, often from a fixed pairing matrix, and sometimes behind a beta gate. OpenRouter’s advisor removes those constraints and adds a few things on top:
- Any model, any provider, on both sides. Both the executor and the advisor can be any of the hundreds of models in the catalog: a cheap open-weights executor consulting a frontier model, a Gemini executor consulting Claude, or a Claude executor getting a second opinion from GPT-5.5 outside its own model family.
- A roster of named advisors. Configure multiple specialists with their own models, instructions, and tool sets in a single request, and let the executor route each question to the right one. Single-vendor versions give you one unnamed advisor.
- Advisors with their own tools. Hand an advisor openrouter:web_search and it grounds its advice in fresh sources before responding.
- Works across API formats, no beta gate. The same tool works through Chat Completions, Responses, and Anthropic Messages (with cross-request memory in all three), and it’s generally available. No beta header, no account-team access request.
If you’re already using a provider-native advisor through one of our compatible API skins, swapping to openrouter:advisor opens up the full catalog without changing the rest of your request.
Billing
Advisor tokens bill at the advisor model’s rates, separate from the executor. If your executor is GPT-4o Mini ($0.15/$0.60 per M tokens) and the advisor is Claude Fable 5 ($10/$50 per M tokens), each model’s tokens bill at their own price. Both show up on your activity page.
Get started
One line in your tools array:
{ "type" : "openrouter:advisor", "parameters" : { "model" : "anthropic/claude-fable-5" } }The model decides when to use it. Most requests won’t trigger a consultation; the ones that do will be better for it.
Read the full docs for parameters, named advisors, sub-agent tools, and more.
Original source - Jun 9, 2026
- Date parsed from source:Jun 9, 2026
- First seen by Releasebot:Jun 11, 2026
Gemini 2.5 Flash API - Pricing, Quickstart & Provider Comparison
OpenRouter adds support for Gemini 2.5 Flash, bringing Google’s reasoning-focused Flash model with built-in thinking, multimodal inputs, provider failover, and one-dashboard billing. The update highlights configurable thinking budgets, pricing details, and production-ready routing controls.
Gemini 2.5 Flash
Gemini 2.5 Flash is Google’s primary model for high-volume, latency-sensitive tasks that require reasoning. It’s the first Flash-class model with built-in thinking, a hybrid reasoning mode you can toggle on or off at will. That distinction makes it meaningfully different from 2.0 Flash and worth evaluating against models that cost significantly more.
Key Capabilities
Gemini 2.5 Flash supports the following input types: text, code, images, audio, video, and documents. For document inputs, two constraints apply in production: maximum file size is 50MB per document (files exceeding this must be split into sub-50MB chunks before submission). Supported document MIME types are limited to application/pdf and text/plain only.
What it does not support: audio generation, image generation, and the Live API. If you need image generation, use Gemini 2.5 Flash Image, which is a separate model.
What “Thinking” Means in Practice
The thinking budget is a parameter that controls how much internal reasoning the model performs before generating a response. This is built into the model’s architecture during inference. Setting the budget to 0 disables it entirely, producing the fastest and cheapest output. Setting it to -1 enables dynamic mode, where the model adjusts reasoning depth based on prompt complexity. On Google’s direct API, -1 is the default. Via OpenRouter, thinking is off unless you explicitly request it (see Configuring via OpenRouter below). Higher fixed budgets increase output quality on complex tasks at the cost of additional latency and token spend, billed at the output rate.
Gemini 2.5 Flash API Pricing
The table below shows verified per-million-token rates across the three access methods. All pricing data sourced from ai.google.dev/gemini-api/docs/pricing and openrouter.ai/google/gemini-2.5-flash. Verify OpenRouter and Vertex AI numbers against their live pages on the day of writing; rates update without notice.
Verification date: May 2026
- Google AI Studio (paid): Input $0.30 / 1M, Output $2.50 / 1M (incl. thinking), Cache Read $0.03, Cache Storage $1.00/M/hr, Audio Input $1.00
- Vertex AI: See Vertex AI pricing
- OpenRouter: Input $0.30 / 1M, Output $2.50 / 1M (incl. thinking), Cache Read $0.03, Cache Storage Verify on live page, Audio Input $1.00
Google AI Studio’s paid tier and OpenRouter carry the same per-token rates for text input and output as of May 2026. Same price per token. What’s wrapped around the API call is where they split.
OpenRouter sits between your code and 3 Google providers (AI Studio, Vertex Global, Vertex). If one goes down, your requests reroute to a healthy one. No code changes.
Your integration isn’t welded to Gemini. Change the model string and you’re calling Claude, GPT-4o, Llama, or any of 300+ models. Same base URL, same SDK, same API key. Swap models in seconds without rewriting your client.
Billing collapses into one dashboard: one invoice, one API key, across every model and provider. No juggling separate accounts with Google, Anthropic, and OpenAI.
For teams shipping to production, OpenRouter layers on enterprise controls (provisioning, per-key spend limits, usage analytics, team management). Guardrails and content filtering are configurable per request, so you can enforce safety policies without building your own moderation stack. Prompt logging and observability come baked into the dashboard for debugging production traffic.
OpenRouter charges a 5.5% platform fee on pay-as-you-go (PAYG) credit purchases. That covers the failover, routing, billing, and tooling above. Google AI Studio is the direct path with no intermediary fee, but you’re on your own for failover, model portability, and cross-provider billing. Vertex AI pricing differs; check the Vertex AI pricing page for current rates before plugging them into production cost estimates.
For real-time Gemini 2.5 Flash pricing and uptime across providers, including live cache rates and effective pricing by provider, see the OpenRouter model page. For caching strategies that reduce repeated context costs, see cache pricing details.
Thinking Token Billing
Thinking tokens are billed at the same rate as output tokens. At budget 0, there is no thinking cost. At the maximum budget (24,576 tokens), thinking overhead can exceed the cost of the visible response itself. To estimate the cost for a given workload, multiply your expected thinking tokens by the output rate and add them to your standard output token cost.
Free Access Options
Google AI Studio provides a free tier with rate limits. On the free tier, your prompts and responses are used to improve Google’s products; see the terms of service for the full data usage policy. If your use case involves user data or requires data not to be used for model training, you must use the paid tier.
OpenRouter does not include Gemini 2.5 Flash in its free tier. A minimum $5 credit balance is required.
Vertex AI provides $300 in trial credits for new Google Cloud accounts, which can be applied toward Gemini 2.5 Flash usage during the evaluation.
API Quickstart: First Request in Under 5 Minutes
The OpenRouter path requires no Google Cloud account and works with any OpenAI-compatible SDK. The Google direct path requires a Google account and the google-genai SDK. For additional SDK examples and configuration options, see the OpenRouter quickstart.
Step 1: Get Your API Key
OpenRouter path: get your OpenRouter API key. No Google Cloud account required.
Google direct path: Get a key at aistudio.google.com/apikey.
Step 2: Set the Base URL (OpenRouter Path)
The OpenRouter base URL is https://openrouter.ai/api/v1. All three code examples below use this endpoint.
Step 3: Make Your First Request
Code examples given for cURL, Python (OpenAI SDK), TypeScript (OpenAI SDK), and Google Direct Path (Python with google-genai SDK).
The direct path uses the google-genai SDK, which is not OpenAI-compatible. Switching from OpenRouter to the direct path requires changing both your client library and request structure. There is no provider failover on the direct path.
Thinking Budget: Control Reasoning Quality and Cost
The thinking budget is the most important configuration decision you’ll make with this model. Set it wrong and you either overpay for reasoning you don’t need or leave accuracy on the table for tasks that require it. For the full parameter reference, see configure the thinking budget.
Budget Levels and Trade-offs
Set the thinkingBudget parameter in your request config. The range is 0 to 24,576 tokens.
- Budget 0: Thinking disabled. Fastest response, lowest cost, no reasoning overhead. Use for high-volume classification, extraction, and summarization where structured reasoning is unnecessary.
- Budget -1 (dynamic): The model auto-selects its reasoning depth based on prompt complexity. This is the default on Google’s direct API. Via OpenRouter, you must explicitly set max_tokens to -1 to get dynamic mode; omitting the reasoning config disables thinking. Recommended for most workloads that need reasoning; it avoids paying for heavy reasoning on simple prompts while engaging it when the task requires it.
- Budget 1,024 to 8,192: Moderate to heavy reasoning. Use for multi-step analysis, structured coding tasks, and research-style questions.
- Budget 24,576 (maximum): Maximum reasoning depth, maximum cost. Use for complex math, scientific problems, and hard-coding challenges where accuracy justifies the overhead.
Critical Constraints
Two constraints will produce errors in production if you aren’t aware of them before writing your first request:
- thinkingBudget and thinkingLevel cannot be used in the same request. thinkingBudget is for Gemini 2.5 series models. thinkingLevel is for Gemini 3 series models. Using both returns a 400 error.
- Structured JSON output and Search Grounding are mutually exclusive. You cannot enable both in the same request.
Configuring via OpenRouter
Use the extra_body parameter with the reasoning key to set the thinking budget through OpenRouter’s API.
To disable thinking entirely, set max_tokens to 0. To use dynamic mode, set max_tokens to -1.
Cross-Provider Performance
OpenRouter routes Gemini 2.5 Flash through three Google providers and tracks real-time throughput, Time to First Token (TTFT), end-to-end latency, and uptime for each. The differences between providers are significant enough to affect the choice of provider for latency-sensitive workloads.
All numbers below require live verification against openrouter.ai/google/gemini-2.5-flash.
Performance by Provider
Source: OpenRouter live model page.
- Google Vertex (Global): Avg Throughput ~75 tok/s; Avg TTFT ~0.63s; Avg E2E Latency and Uptime: Verify on live page
- Google AI Studio: Verify on live page
- Google Vertex: Verify on live page
The Vertex Global provider shows the highest throughput in recent data. AI Studio historically shows the best uptime. Standard Vertex shows the highest latency of the three. When you route through OpenRouter without specifying a provider, it automatically distributes traffic to the healthiest option based on real-time signals.
For real-time Gemini 2.5 Flash pricing and uptime, see the OpenRouter model page.
Gemini 2.5 Flash vs Flash Lite vs Pro
Choose based on your workload requirements:
- Use Gemini 2.5 Flash for most agentic and reasoning workloads. It’s the default recommendation when you need thinking capability without incurring Pro-level costs.
- Use Gemini 2.5 Flash Lite for high-volume classification, extraction, or translation tasks where thinking isn’t required and cost per request is the primary constraint. Thinking is disabled by default on Flash Lite.
- Use Gemini 2.5 Pro for complex reasoning tasks where accuracy justifies a 5 to 10x cost premium over Flash: frontier mathematics, hard-coding challenges, and multi-step scientific analysis.
Technical Specifications
The table below is the canonical reference for Gemini 2.5 Flash. For the authoritative version, see the Google AI for Developers model page (updated 2026-04-01) and the Vertex AI docs (updated 2026-04-03).
- Model ID: gemini-2.5-flash
- OpenRouter model string: google/gemini-2.5-flash
- Context window: 1,048,576 tokens
- Max output: 65,536 tokens
- Input types: Text, images, video, audio, code, documents (PDF and text/plain only, 50MB max)
- Output types: Text
- Thinking budget range: 0 to 24,576 tokens (default: dynamic / -1)
- Knowledge cutoff: January 2025
- GA release: June 17, 2025
- Discontinuation: October 16, 2026
- Supported capabilities: Function calling, structured outputs, code execution, Search Grounding, Batch API, context caching (implicit and explicit), file search, URL context
- Not supported: Audio generation, image generation, Live API, thinkingLevel parameter
Deprecation notice:
Gemini 2.5 Flash is scheduled for discontinuation on October 16, 2026, on Vertex AI. If you’re building for production use cases that extend beyond that date, plan a migration to a successor model and monitor ai.google.dev/gemini-api/docs/models for updates.
Frequently Asked Questions
Is Gemini 2.5 Flash free to use?
Google AI Studio provides a free tier with rate limits. On the free tier, your prompts and responses are used to improve Google’s products; see the terms of service before using it with user data. OpenRouter does not include Gemini 2.5 Flash in its free tier; a minimum $5 credit balance is required. Vertex AI provides $300 in trial credits for new Google Cloud accounts.
What is the thinking budget in Gemini 2.5 Flash?
The thinkingBudget parameter (range: 0 to 24,576 tokens, or -1 for dynamic) controls how much internal reasoning the model performs before responding. Budget 0 disables thinking: fastest and cheapest. Budget -1 enables dynamic mode: the model auto-adjusts based on prompt complexity. On Google’s direct API, -1 is the default. Via OpenRouter, thinking is off unless you explicitly request it (e.g. extra_body={"reasoning": {"max_tokens": -1}} for dynamic, or any positive budget). Higher fixed budgets improve output quality on complex tasks but increase latency and cost, billed at the output token rate.
How does Gemini 2.5 Flash compare to GPT-4o?
Flash supports a 1M-token context window, versus 128K for GPT-4o, and includes configurable thinking not available in GPT-4o. Flash’s per-token pricing is lower. GPT-4o has broader third-party ecosystem support and a longer production track record. Direct benchmark comparisons on the same evaluations aren’t published across both models in this guide; use the OpenRouter rankings for current third-party evaluation data.
Can I use Gemini 2.5 Flash for image generation?
No. Gemini 2.5 Flash outputs text only. Image input is supported; the model can process and reason about images. For image generation, use Gemini 2.5 Flash Image, a separate model with its own pricing.
What providers serve Gemini 2.5 Flash on OpenRouter?
Three: Google AI Studio, Google Vertex Global, and Google Vertex. OpenRouter routes to the healthiest provider automatically based on real-time throughput and uptime data. You can pin to a specific provider using OpenRouter’s provider routing controls.
What is the difference between Gemini 2.5 Flash and Flash Lite?
Flash includes configurable thinking (budget 0 to 24,576) and higher-quality output. Flash Lite is optimized for ultra-low latency and cost, with thinking disabled by default (though it can be enabled). Use Flash when reasoning capability matters; use Lite for high-volume tasks where cost per request is the primary constraint.
Original source All of your release notes in one feed
Join Releasebot and get updates from OpenRouter and hundreds of other software products.
- Jun 1, 2026
- Date parsed from source:Jun 1, 2026
- First seen by Releasebot:Jun 11, 2026
May Release Spotlight
OpenRouter ships a major May update with Workspace Guardrails, new Speech and Transcription APIs, Model Fusion and Comparison, private models, stronger enterprise controls, preset and routing improvements, better logs and budgets, and 20 new models across text, speech, image, video, and coding.
We closed our $113M Series B, and we’re now routing 100 trillion tokens a month. Here’s everything else that shipped in May.
Workspace Guardrails
Centralized security and governance for every request routed through your workspace. Set per-member and per-key spend limits, lock traffic to a model and provider allowlist, enforce zero data retention, block prompt injection against 30+ OWASP-derived patterns, and redact PII before it reaches a provider. Layer the rules into one guardrail, or scope them to specific API keys and members, with no code changes.
Speech and Transcription APIs
Add voice to any application through the same API key you already use. Speech-to-text is live with Whisper, GPT-4o Mini Transcribe, and Voxtral; text-to-speech exposes supported_voices in the models API. Provider failover and upstream error passthrough are built into both.
Model Fusion
Route your prompt to multiple models in parallel and synthesize their responses into a single, higher-quality answer. Model Fusion is now available as an API plugin, a server tool, and in the chatroom composer. You get an ensemble of experts in a single call instead of relying on one model.
Model Comparison
Compare up to five models side by side on pricing, context length, and benchmark scores. The rebuilt comparison page includes a “Highlight best” toggle, provider-coded benchmark charts for Intelligence, Coding, and Agentic metrics, and interactive slot cards to quickly add models.
Private Models (Enterprise)
Route to your own custom, fine-tuned, or dedicated model endpoints through the standard completions and responses API. Your private models get the same guardrails, observability, and billing as any public model on the platform. Available exclusively on the Enterprise plan.
Pareto Code Router
Set min_coding_score and route to the cheapest code-capable model that clears your quality bar. Your coding agents stop overpaying for good-enough code. Configurable defaults per workspace in plugin settings.
Enterprise & Workspace Controls
A set of releases for teams running OpenRouter at scale:
- IP allowlist enforcement. API keys with an IP allowlist now actively block requests from unauthorized IPs with a 403, upgraded from observe-only mode.
- BYOK management API. Programmatically list, create, update, and delete bring-your-own-key credentials across workspaces. Keys are now grouped by priority with drag-and-drop reordering and a one-click “Test Key” for failed requests.
- Observability destinations API. CRUD endpoints for managing Datadog, Langfuse, LangSmith, and other observability integrations via management key.
- Per-provider ZDR controls. Separate Zero Data Retention toggles for non-frontier, Anthropic, OpenAI, and Google providers, so you can meet compliance requirements per provider without restricting your entire model catalog.
- Copy guardrails across workspaces. Standardize safety policies across all workspaces in a few clicks via the “Copy to…” menu.
Also shipped this month
- Presets API. Create or version a preset directly from an inference request body, now with Anthropic Messages and Responses skins, plus TypeScript and Python SDK support.
- Human-in-the-loop tools. A new SDK tool type that pauses execution and waits for human input before returning results, for agents that need human judgment mid-task.
- Session-id provider stickiness. Requests sharing a session_id now route to the same provider and pin to the same concrete model across turns, improving cache hit rates for multi-turn agentic workflows.
- Auto router cost_quality_tradeoff. A 0 to 10 integer replacing the old binary toggle for finer control over cost versus quality when using the auto router.
- Redesigned model pages. New model page header, step-by-step API tab with /responses and /messages endpoints, full-screen model selector, and playground side panel for inline testing.
- Requests tab in logs. Full request-level drill-down alongside generation logs, with request ID filtering and time picker shorthand (15min, 1h, 3d).
- Improved coding agent attribution. Cursor, GitHub Copilot, Cline, RooCode, Kilo Code, Zed, and OpenCode are now properly identified in activity logs so you can see which tools drive your usage.
- Usage & Budgets on API keys. Spend charts and budget progress by guardrail layer, directly on each API key.
- Rankings daily dataset. GET /api/v1/datasets/rankings-daily returns top-50 models by daily token volume for programmatic analysis.
New models
20 models launched in May, spanning text, speech, image, video, and coding:
- Anthropic Claude Opus 4.8: Anthropic’s latest Opus with mid-session system support, plus a fast variant
- Google Gemini 3.5 Flash: Google’s newest Flash model
- xAI Grok 4.3: xAI’s latest frontier model
- xAI Grok Imagine Video: Video generation from xAI
- xAI Grok Build 0.1: xAI’s code generation model
- Qwen Qwen3.7 Max: Qwen’s latest max-tier model
- Recraft V3, V4, V4 Pro: Three new image generation models
- Mistral Voxtral Mini Transcribe: Mistral’s speech-to-text model
Plus: Gemini 3.1 Flash Lite, GPT Chat Latest, CoBuddy (free), Ring-2.6-1T (free), Perceptron Mk1, and more.
Everything above is live now.
Original source - May 29, 2026
- Date parsed from source:May 29, 2026
- First seen by Releasebot:Jun 11, 2026
Guardrails: Protect your Agents, Data, and Costs
OpenRouter adds workspace guardrails for budget enforcement, zero data retention, model and provider restrictions, prompt injection defense, and data loss prevention, with configurable rules for members, API keys, and the management API.
OpenRouter workspaces have guardrails
OpenRouter workspaces have guardrails: a set of configurable security and governance tools for budget enforcement, zero data retention (ZDR), model and provider restrictions, prompt injection defense, and data loss prevention. Layer each of these rules into a guardrail to govern your entire workspace, or create customized guardrails for team member groups or API keys, all without changing your code.
Go to Workspaces > Guardrails in your home dashboard or use the management API to create guardrails. Read the docs for more detail.
Budget Enforcement
Set spending limits with daily, weekly, or monthly reset windows. Requests that exceed the limit for the time period will fail with a 402 response. Use it to cap spending per member or per key so a single runaway script can’t burn the month’s budget.
Guardrail budgets are per-entity, not shared. Assign a guardrail with a $50/day limit to three team members, and each one gets their own $50 budget. API key budgets layer independently on top of member budgets. If Audrey has a $100/day member limit and her key has a $30/day limit, the key caps at $30 and Audrey’s total across all keys in the workspace caps at $100. Both are checked on every request.
Zero Data Retention (ZDR) and Model/Provider Restrictions
Disable all endpoints that retain or train on data in one-click, block individual models or providers, or restrict the workspace to a model/provider allowlist. Disallowed requests fail with a 404 response. Use it to keep traffic on providers you’ve vetted, off providers that retain or train on inputs, and on the model price tier each project should use.
Your account-wide privacy policies and provider restrictions are inherited by default. Guardrails can only be more restrictive.
Prompt Injection Defense
Scan inputs against a set of >30 regex patterns derived from the OWASP LLM Prompt Injection Prevention Cheat Sheet and other resources to identify prompt injection and jailbreak attempts. The detection system includes techniques to catch common evasion strategies: typoglycemia, encoding-based, and character-spaced evasion. It’s deterministic and latency overhead is negligible.
Detection runs before the request is sent to the model provider, so blocked traffic never leaves OpenRouter. Use it to catch common injection and jailbreak patterns, especially for agents that pass user input verbatim.
Choose the action you want taken when a pattern is detected:
- Flag: The request passes through unmodified; the detection is recorded for observability, but no enforcement is applied. Useful for evaluating the impact on your traffic before switching to redact or block.
- Redact: Matched parts of the input are replaced with [PROMPT_INJECTION] and the sanitized request is sent to the model.
- Block: The entire request is rejected with a 403 before it reaches the model. The 403 response includes metadata about the type of pattern detected.
Read the prompt injection detection docs.
Data Loss Prevention (DLP)
Detect and handle PII and other sensitive information in requests. Seven sensitive info types are built in. You can also add your own custom regex patterns for domain-specific data (internal project codenames, proprietary IP). Configure each to Redact the sensitive info or Block the request entirely. Blocked requests return a 403 response with information about the type of content detected. Use it to keep PII and sensitive identifiers out of vendor logs and in compliance with your data handling commitments.
Most built-in patterns and all custom patterns use regular expression matching. This is deterministic and adds negligible latency to requests. Names and addresses use Natural Language Processing (NLP) via Presidio and add latency proportional to input size.
Built-in pattern table includes Email address ([EMAIL]), Phone number ([PHONE]), Social Security number ([SSN]), Credit card number ([CREDIT_CARD]), IP address ([IP_ADDRESS]), Person name ([PERSON_NAME]), and Address ([ADDRESS]).
Read the sensitive info protection docs.
Assign to API keys or org members
You can assign a guardrail to multiple API keys or members. When assigned to members, the guardrail applies to all of their keys in the workspace.
Each workspace has a default guardrail you can configure that applies to every API key and member in the workspace. You can create additional guardrails to further restrict specific API keys or members. The workspace default guardrail sets the baseline; any additional guardrails layer on top.
Start using guardrails
Go to Workspaces > Guardrails in your home dashboard to configure your workspace guardrail or create guardrails for specific API keys or members.
Configure programmatically. The Management API supports every guardrail operation, including create, update, delete, list, and assign to keys or members, so you can automate provisioning during team onboarding or key rotation.
curl https://openrouter.ai/api/v1/guardrails \ -H "Authorization: Bearer $OPENROUTER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "production-safety", "limit_usd": 100, "reset_interval": "daily", "allowed_models": [ "anthropic/claude-sonnet-4.6", "openai/gpt-5.4", "google/gemini-3.1-pro-preview" ], "content_filter_builtins": [ {"slug": "regex-prompt-injection", "action": "block"}, {"slug": "email", "action": "redact"}, {"slug": "credit-card", "action": "block"} ] }'Guardrails docs overview · Guardrails API reference
Original source - May 8, 2026
- Date parsed from source:May 8, 2026
- First seen by Releasebot:Jun 11, 2026
Human-in-the-Loop Tools for the Agent SDK
OpenRouter adds human-in-the-loop (HITL) tools to the Agent SDK, letting agents auto-handle routine calls, pause for human review on higher-stakes cases, and resume with a single hook. It also adds response post-processing for paused calls and full pause-resume state handling.
The Agent SDK now supports a fourth tool type: human-in-the-loop (HITL) tools. They let your agent handle routine calls automatically and pause for a human when stakes are high, all controlled by a single hook.
Install the SDK, define your HITL tool, and follow the cookbook recipe for a working implementation.
npm install @openrouter/agentAuto-resolve or escalate, per call
Regular tools always execute. Manual tools always pause. HITL tools do both: your onToolCalled hook inspects the input and decides.
import { tool } from '@openrouter/agent/tool'; import { z } from 'zod'; const approvePayment = tool({ name: 'approve_payment', description: 'Approve a payment, escalating large amounts to a human', inputSchema: z.object({ amount: z.number(), recipient: z.string(), }), outputSchema: z.object({ approved: z.boolean(), reviewedAt: z.number().optional(), }), onToolCalled: async (input) => { if (input.amount < 100) { return { approved: true }; } // Pause for human review return null; }, });Return a value and the agent keeps going (like a regular tool). Return null and the loop pauses with status: 'awaiting_hitl', surfacing the pending call to your application. You resume by calling callModel again with a function_call_output item containing the human’s decision.
This pattern fits anywhere the decision depends on data: dollar thresholds, risk scores, content policy flags, compliance checks. The branching logic lives in one function, not scattered across your application code.
Post-process human responses before the model sees them
An optional second hook, onResponseReceived, fires when a human supplies a result for a paused call. It transforms the raw input before passing it to the model.
onResponseReceived: async (raw) => { return { ...(raw as Record<string, unknown>), reviewedAt: Date.now() }; },Use it to stamp metadata, normalize formats, validate against business rules, or enrich the response with context the human didn’t need to provide manually. If it throws, the error surfaces to the model as { error: ..., originalOutput: ... } so nothing gets silently swallowed.
How the pause and resume cycle works
Here’s the full lifecycle:
- The model calls your HITL tool during an agent loop.
- onToolCalled runs. If it returns a value, the agent continues. If it returns null, the loop pauses.
- Your application reads the pending calls via getToolCalls() and presents them to the user.
- The user makes a decision.
- You call callModel again with the decision as a function_call_output item.
- onResponseReceived (if defined) transforms the response.
- The model receives the result and the agent loop resumes.
const result = openrouter.callModel({ model: 'openai/gpt-4o', input: 'Pay $500 to Acme Corp for the May invoice', tools: [approvePayment] as const, state, }); const response = await result.getResponse(); if (response.state?.status === 'awaiting_hitl') { const pending = response.state.pendingToolCalls ?? []; // Present pending[0] to your user, collect their decision, then resume: const resumed = openrouter.callModel({ model: 'openai/gpt-4o', input: [{ type: 'function_call_output' as const, callId: pending[0].id, output: JSON.stringify({ approved: true }), }], tools: [approvePayment] as const, state, }); }The SDK handles all the state tracking, hook dispatch, and schema validation. You wrote zero loop code.
When to use HITL vs requireApproval
Both pause for human input. The difference is in the decision logic.
Use requireApproval when every invocation needs explicit human consent regardless of input (think: “delete this database,” “send this email”). Use HITL when some calls can proceed automatically and others need a human (think: “approve this payment if it’s under $100”).
Start building
The HITL tools cookbook recipe walks through a complete implementation: defining the tool, detecting pauses, collecting human input, and resuming the loop.
For the full type signatures and API surface, see the tools documentation and the API reference.
Get your API key and tell us what you’re building on Discord.
Original source - May 7, 2026
- Date parsed from source:May 7, 2026
- First seen by Releasebot:Jun 11, 2026
Consistent Web Search and Fetch Across Every Model
OpenRouter adds server-side web_search and web_fetch tools that any tool-calling model can use during a request, giving consistent search and page retrieval across providers. The update brings configurable engines, domain controls, and a simpler migration path from the old web search plugin.
Introducing openrouter:web_search and openrouter:web_fetch, two new tools that any model can call during a request. When a model decides to use one, OpenRouter executes it server-side and returns the result to the model without requiring any client-side implementation.
- Web Search: a tool for agentic search 0 to N times per request, letting the model choose its own queries and timing.
- Web Fetch: a tool for retrieving full page content from any URL. Commonly used for pages found during search.
Try it now in the chatroom by clicking the tool icon and read the docs for the API details.
Swap Models Without Swapping Tools
Each model provider has its own built-in web search tool with a different schema. Switch models or providers, and you’re stuck rewriting how you define, configure, and parse search results. You’re also not guaranteed to have the same behaviors available, which can be problematic if you need features like strictly enforced blocked domains.
These new server tools give you one consistent way of enabling search and fetch. Specify {"type": "openrouter:web_search"} once, and the tool definition, invocation, and result format stay identical across all tool-calling models. If you want identical search behavior as well, you can specify a provider like Exa or Parallel so the results coming back to the model are consistent regardless of whether the request routes to GPT-5.5, Claude, or Kimi.Web Search
Web search supports four engines:
- Auto (default): Uses native if the provider supports it, otherwise Exa. Pricing varies.
- Native: The provider’s built-in search (OpenAI, Anthropic, Google, xAI, Perplexity), provider pricing.
- Exa: Passes the search to Exa and bills from your OpenRouter credits. $0.005 per request. Includes up to 10 results, then $0.001 per additional result.
- Parallel: Passes the search to Parallel and bills from your OpenRouter credits. $0.005 per request. Includes up to 10 results, then $0.001 per additional result.
Each engine has different strengths. Native search is tightly integrated with the provider’s model. Exa and Parallel add configurable result context size (search_context_size), which native engines ignore. Most engines support domain filtering (allowed_domains, excluded_domains).
You can configure this in the chatroom UI or via the API.Parallel Searches in Agentic Loops
When a model needs to compare information across sources, it can fire multiple searches in a single request. A question like “compare the pricing of the top 3 cloud GPU providers” might trigger three separate searches, each with different queries, before the model synthesizes an answer.
Use max_total_results to cap cumulative results across all searches in a request. This keeps costs and context usage predictable. Once the cap is hit, the model gets a message saying the limit was reached instead of running another search.Web Fetch
Web fetch lets models retrieve full page content from URLs and comes with five supported engines:
- Auto (default): Uses native if supported, otherwise Exa, pricing varies.
- Native: The provider’s built-in fetch, provider pricing.
- OpenRouter: Direct HTTP fetch by OpenRouter, free.
- Exa: Content extraction and clean markdown output, $0.001 per fetch.
- Parallel: High-quality content extraction via Parallel’s extract API, $0.001 per fetch.
Specifying Exa, Parallel, or OpenRouter as the engine ensures consistent fetch behavior across all models, including the ability to restrict which URLs the model can fetch using allowed_domains and blocked_domains. Native provider fetch capabilities vary, so choose one of these engines if you need the parameters to be respected across models.
Use max_content_tokens to cap how much content the model receives (useful for large pages that would eat your context window).Migrating From the Web Search Plugin
Until now, models could only search through the web search plugin, which ran exactly one search per request regardless of what the model actually needed. The model had no say in when to search, what to search for, or whether to search at all.
To migrate, replace plugins with tools in your request body:Before (plugin):
"plugins": [{"id": "web"}]After (server tool):
"tools": [{"type": "openrouter:web_search"}]Server tools let the model decide when and how often to search. One caveat: server tools require a model that supports tool calling. If your current model doesn’t support tools, you’ll need to switch to one that does or keep using the plugin.
Original source
We’ve created a migration guide with full details. - May 1, 2026
- Date parsed from source:May 1, 2026
- First seen by Releasebot:Jun 11, 2026
New Audio APIs for Speech and Transcription
OpenRouter adds dedicated audio endpoints for text-to-speech and speech-to-text, with faster, cost-efficient models for speech and transcription. It now supports voices from OpenAI, Google, and Mistral, plus transcription with Whisper, all with familiar routing and billing.
Choosing a model: Audio vs. Speech vs. Transcription
OpenRouter now has two dedicated audio endpoints:
/api/v1/audio/speechfor text-to-speech and/api/v1/audio/transcriptionsfor speech-to-text.These new endpoints deliver specialized models that are generally faster and more cost-efficient than the general audio models we already support, but are more narrowly useful for specific audio tasks.
You can now generate speech from text with OpenAI, Google, or Mistral voices and transcribe audio files with OpenAI Whisper. All with the same routing, billing, and key management you already use for text, video and image generation.
The choice of models is a balance of specialization, cost, and speed. We’ve enabled access to the breadth of options so you can choose the right path for each use case:
Table comparing Audio models, Speech models, and Transcription models:
- What it does:
- Audio models: Understands audio input and reasons over it, like a voice-native LLM
- Speech models: Converts text into lifelike spoken audio
- Transcription models: Converts audio into text
- Input → Output:
- Audio models: Text/audio → text/audio
- Speech models: Text → audio
- Transcription models: Audio → text
- Best for:
- Audio models: Voice agents, mixed-modality conversations, audio Q&A
- Speech models: Reading text aloud with built-in voices and streaming
- Transcription models: Meeting notes, subtitles, feeding voice input into text pipelines
- Endpoint:
- Audio models:
/chat/completions - Speech models:
/audio/speech - Transcription models:
/audio/transcriptions
- Audio models:
- Trade-offs:
- Audio models: More powerful but heavier and more expensive
- Speech models: Simpler, faster, cheaper (no reasoning needed)
- Transcription models: Purpose-built for accuracy across languages and accents
Try it in the Playground
Both Speech and Transcription have dedicated Playground tabs on model pages (examples: GPT-4o Mini TTS’s Playground and GPT-4o Transcribe’s Playground). For speech models, pick a voice from the dropdown, type your text, and hear the result. For transcription models, drag and drop an audio file and see the transcription.
Each model page also shows quickstart code in Python, TypeScript, curl, and the OpenRouter SDK, so you can copy a working example and have audio running in your app in minutes.
Getting started with Speech models
Send text, get audio back. The response is a raw byte stream you can pipe straight to a file or audio player.
Speech providers currently include OpenAI (GPT-4o Mini TTS), Google (Gemini Flash TTS), and Mistral (Voxtral Mini TTS). Each model has its own voice set, browsable on model pages. Output is MP3 or PCM format.
Provider-specific options pass through cleanly. For example, OpenAI’s speech models accept an instructions field for tone control (e.g., “speak in a warm, friendly tone”).
Getting started with Transcription models
The transcription endpoint takes a base64-encoded audio file and returns text. It supports WAV, MP3, FLAC, and other common formats.
Transcription providers currently include OpenAI (Whisper, GPT-4o Transcribe, GPT-4o Mini Transcribe), Google (Chirp 3), and Groq (with their fast Whisper inference). You can optionally pass a language hint to improve accuracy for non-English audio.
What’s next
We’re actively adding more providers and voices. If there’s a speech or transcription model you want to see on OpenRouter, tell us on Discord.
Original source - Apr 30, 2026
- Date parsed from source:Apr 30, 2026
- First seen by Releasebot:Jun 11, 2026
Response Caching: Zero Cost for Identical Requests
OpenRouter adds response caching for chat, responses, messages, and embeddings requests, turning identical repeats into instant, zero-cost hits. The beta feature speeds retries and test runs, supports streaming, and offers TTL controls plus cache status headers.
View the response caching docs
You can now add X-OpenRouter-Cache: true to your chat completions, responses, messages, or embeddings requests to start caching identical calls. The first call hits the provider and gets billed normally. Every identical call after that returns the same response in a tiny fraction of the time, with zero tokens billed.
What it does
Response caching sits in front of the model provider. When you send a request with caching enabled, OpenRouter hashes the request body, model, API key, and streaming mode into a cache key. If an identical request was made before and hasn’t expired, the cached response comes back immediately. No provider call, no token consumption, no charge.
Both streaming and non-streaming requests work. Cached streaming responses replay through the same pipeline, so your client code doesn’t need to change. Text, images, audio, documents, and tool calls all cache normally. Multimodal inputs (base64 images, audio clips, file attachments) are included in the cache key hash. One caveat: very large multimodal payloads that get offloaded internally for processing aren’t eligible for caching. Standard-sized requests cache fine.
Response caching is separate from prompt caching. Prompt caching (which many providers offer natively) reduces the cost of the prompt portion when messages share a common prefix. Response caching skips the provider entirely and returns the full response from OpenRouter’s edge cache.
Reduces response times from seconds to milliseconds
Cached responses come back in 80-300ms, most of which is serialization and network. The cache lookup itself averages 4ms. For comparison, a typical uncached request to Gemini 2.5 Flash takes about 1.3 seconds, Kimi K2.6 takes 4.6 seconds, and GPT-5.5 takes 9.1 seconds. Cache hits are billed at zero: no prompt tokens, no completion tokens, no charge.
Enable it with a request header or with presets
Add the X-OpenRouter-Cache: true header to each API call you want to be eligible:
curl https://openrouter.ai/api/v1/chat/completions \ -H "Authorization: Bearer $OPENROUTER_API_KEY" \ -H "Content-Type: application/json" \ -H "X-OpenRouter-Cache: true" \ -d '{ "model": "google/gemini-2.5-flash", "messages": [{"role": "user", "content": "What is the meaning of life?"}] }'Presets. Enable caching for all requests using a specific preset by setting cache_enabled: true in the preset config. No header needed on individual requests.
You can control how long responses stay cached with X-OpenRouter-Cache-TTL (1 second to 24 hours, default 5 minutes). Need a fresh response? Send X-OpenRouter-Cache-Clear: true to bust the cache for that specific request.
Response headers tell you what happened: X-OpenRouter-Cache-Status: HIT or MISS, plus X-OpenRouter-Cache-Age and X-OpenRouter-Cache-TTL so you can see exactly how the cache is performing.
Where it helps most
Agent retries. When an agent workflow fails partway through, you can retry from the top. Cached steps return instantly and for free, so you only pay for the new work.
Test suites. Run your LLM-backed tests repeatedly without burning tokens. After the first run populates the cache, subsequent runs are deterministic and free.
Repeated context processing. If your app sends the same prompt to the same model (same system prompt, same user input, same parameters), only the first call costs anything.
Available now across most generation endpoints
The cache is scoped to your API key. Different keys (even under the same account) don’t share cache entries.
The feature works across /chat/completions, /responses, /messages, and /embeddings. Other endpoints — legacy /completions, /audio/speech (TTS), /audio/transcriptions (STT), /rerank, and video generation — are not yet supported. It’s currently in beta, and we’re watching how it performs before locking down the API surface.
Cache hits don’t count toward provider rate limits (since the request never reaches the provider), and they’re visible in your Activity log with a cache indicator for easy monitoring.
Full details in the docs.
Original source - Apr 30, 2026
- Date parsed from source:Apr 30, 2026
- First seen by Releasebot:Jun 11, 2026
April Release Spotlight
OpenRouter ships major April updates with video generation, isolated workspaces, and a TypeScript Agent SDK that turns any model into an agent. It also adds rerankers, model fusion, prompt history, knowledge cutoff dates, typed tool context, and new frontier model launches.
April’s biggest releases: video generation, workspaces for multi-project isolation, and a TypeScript SDK that turns any model into an agent. Here’s what shipped.
Video Generation
Generate video from text or images through one unified API. We’re supporting Seedance 2.0, Veo 3.1, Wan 2.7, Sora 2 Pro, and more on day one, with normalized parameters, async job tracking, and capability discovery built in. Video sits alongside text, images, audio, embeddings, and rerankers under the same routing and billing layer.
Browse video models · API docs · Announcement
Workspaces
Organize your OpenRouter projects into separate environments, each with its own API keys, routing defaults, guardrails, and observability. Built for multi-project developers, enterprise teams, and agents that need staging vs. production isolation.
Your existing setup lives in a Default workspace. If you don’t need multiple environments, nothing changes.
Create a workspace · Docs · Announcement
Agent SDK with callModel
Agent SDK with callModel
@openrouter/agent gives you callModel: one function that handles multi-turn tool calling, streaming, stop conditions, and cost tracking across 300+ models. Define tools with Zod schemas, set stop conditions like stepCountIs(10) or maxCost(1.00), and let the SDK run the agentic loop.
Two skills ship on top of it: create-agent-tui to scaffold a coding agent with a terminal UI, and create-headless-agent to build a multi-model CLI tool with any coding agent.
SDK docs · Announcement · Agent harness tutorial
Also shipped this month
Reranker models. A new modality for re-ordering search results and document chunks by relevance. Cohere Rerank and Fireworks rerankers are live with a new /rerank endpoint.
API docs · Models
Model Fusion. Fuse the results of multiple models into a single response.
Try it
Prompt history. View, replay, and remix previous prompts from OpenRouter. Audit output quality, iterate on prompts, and replay the same input across different models for comparison.
Benchmarks on model compare. Model comparison pages now show Design Arena ELO rankings with visualizations for 3D, website building, SVG, and more. Click “Compare” at the top of any model page.
Compare models
Knowledge cutoff dates. LLM training cutoff dates are now available in the /models API, so you can programmatically check how current a model’s knowledge is.
Stripe Projects. Run stripe projects add openrouter/api to get an OpenRouter account, an API key, and Stripe billing, all from the command line. Your agents can do it too.
Announcement
Typed tool context in the TypeScript SDK. Define a contextSchema on tools, pass context from callModel, mutate it with setContext(), and changes persist across turns, all Zod-validated.
Frontier model drops
Massive model launches this month:
- GPT-5.5 & GPT-5.5 Pro: OpenAI’s newest frontier models, state-of-the-art for long-running work across code, data, and tools
- DeepSeek V4 Pro & V4 Flash: Huge jump over V3.2, meeting or surpassing current state-of-the-art across benchmarks
- Claude Opus 4.7: Anthropic’s most capable Opus, built for long-running async agents
- GPT-5.4 Image 2: GPT-5.4 with state-of-the-art image generation
- Kimi K2.6: Moonshot AI’s long-horizon coding model built for sustained agentic work
Browse all models
Are we missing something you want to see?
Let us know on Discord.
Original source - Apr 24, 2026
- Date parsed from source:Apr 24, 2026
- First seen by Releasebot:Jun 11, 2026
Agent SDK: Building Multi-turn Agent Workflows on OpenRouter
OpenRouter introduces @openrouter/agent, a model-agnostic TypeScript SDK that powers durable agent loops with tool execution, validation, streaming, cost tracking, stop conditions, and human approval across 300+ OpenRouter models.
Constructing an agent requires a layered set of behaviors beyond the chat completion: it needs to call a model, inspect the output for tool requests, execute those tools, feed the results back, and repeat until the task is done. Building a durable version of that loop means handling input validation, streaming, cost tracking, and knowing when to stop.
@openrouter/agent is a model-agnostic TypeScript SDK that packages all of that into a single function that can execute this agentic loop on any of the 300+ models on OpenRouter. To show off how powerful this agentic loop is, we published a tutorial on how a skill on top of Agent SDK enables you to build your own personal agent harness.
From chat completions to agentic behavior
A standard chat completion is stateless: you send messages, you get a response. Turning that into an agent requires bolting on several behaviors:
Tool execution
The model produces a structured tool call. Your code has to parse it, validate the arguments, run the function, and format the result for the next request. With callModel, you define tools using tool() and Zod schemas. Tools run separately from model calls, with clean boundaries and no tangled logic. The SDK validates inputs from the model and outputs from your function at runtime. If the model sends bad arguments, you get a clear error instead of a silent failure downstream.
Multi-turn loops
An agent rarely finishes in one step. It might search, read results, search again, then write a summary. That means looping: call the model, execute tools, call the model again, repeat. callModel handles the loop internally. You control it with stop conditions: custom functions that receive the full step history and decide whether to keep going.
Stop conditions
Without guardrails, an agent loop can run forever (or at least until your bill gets uncomfortable). callModel accepts composable stop conditions: stepCountIs(10) caps the loop at 10 turns, maxCost(1.00) sets a dollar limit, hasToolCall('done') stops when a specific tool gets called. Combine them, or write a custom function.
Streaming
Agents that take multiple steps need to show progress. callModel gives you getTextStream(), getToolCallsStream(), and getReasoningStream(). Stream, extract, and process the same response concurrently, no need to choose upfront.
Cost tracking
Every response includes token counts and cost data via result.getResponse(), so you know exactly what each agent run costs.
Tool approval
For agents that take real-world actions, you can mark tools as requiring approval. When the model calls one, the SDK pauses execution, returns control to your code, and waits for you to collect a decision before resuming.
callModel handles all of these so you can focus on the tools and logic specific to your application. And because the SDK is model-agnostic, you can swap between any model on OpenRouter without changing your agent code.
Get Started
npm install @openrouter/agent
import { OpenRouter } from '@openrouter/agent'; import { tool } from '@openrouter/agent/tool'; import { stepCountIs } from '@openrouter/agent/stop-conditions'; import { z } from 'zod'; const client = new OpenRouter({ apiKey: process.env.OPENROUTER_API_KEY }); const result = client.callModel({ model: 'anthropic/claude-sonnet-4', input: 'What time is it in Tokyo?', tools: [ tool({ name: 'get_time', description: 'Get current time in a timezone', inputSchema: z.object({ timezone: z.string() }), execute: async ({ timezone }) => { try { return { time: new Date().toLocaleString('en-US', { timeZone: timezone }) }; } catch { return { error: `Invalid timezone: ${timezone}. Use IANA format like 'Asia/Tokyo'.` }; } }, }), ], stopWhen: [stepCountIs(5)], }); const text = await result.getText();The SDK calls the model, sees it wants to use get_time, validates the input against your Zod schema, executes the function, feeds the result back, and returns the final text. You wrote zero loop logic.
Get your API key and point your agent at the callModel docs.
Tell us what you’re building on Discord.
Original source - Apr 24, 2026
- Date parsed from source:Apr 24, 2026
- First seen by Releasebot:Jun 11, 2026
Build Your Own Harness with the Agent SDK
OpenRouter adds two new agent-building skills, create-agent-tui and create-headless-agent, for scaffolding customizable terminal or headless TypeScript harnesses. They run on the Agent SDK, support OpenRouter models, and include config, tools, persistence, retries, and structured output.
We built two skills for building your own agent harness. The first, create-agent-tui, scaffolds a full terminal UI with customizable looks — banners, tool display styles, and input fields you can match to Codex’s style or Claude Code’s style. The second, create-headless-agent, scaffolds a headless agent for CLI tools, API servers, queue workers, and pipelines — no terminal UI, just structured input/output.
Point Claude Code, Codex, Cursor, or any skill-compatible agent at either skill, describe what you want, and it generates a complete, runnable TypeScript project. Both run on the recently launched Agent SDK and work with any model on OpenRouter.
Why do this when there are many great commercial harnesses out there?
- You want fine-grained control over the look, tools, or the loop
- You want a minimal harness you can ship as part of a product
- You want to learn how agents work to get better at using and debugging them
Try building your own now
- Get an OpenRouter API key if you don’t have one
- Install the skill you want in your coding agent:
- Agent TUI : gh skill install OpenRouterTeam/skills create-agent-tui
- Headless agent : gh skill install OpenRouterTeam/skills create-headless-agent
- Tell your agent to build you a coding assistant and what will make your assistant unique
- Run the generated project:
- Agent TUI : bun install && bun run start
- Headless agent : bun install && bun run src/cli.ts -m '~anthropic/claude-opus-latest' -p "What's in this repo?"
The skill presents an interactive checklist when invoked. You pick what you need: server tools (web search, datetime, image generation), local tools (file read/write/edit, grep, glob, shell, and more), harness modules (session persistence, context compaction, tool approval gates), and slash commands (/model to switch models on the fly, /new for fresh conversations, /export to save as Markdown). After you make your selections, it generates the full project and verifies types with tsc.
Every part of the terminal UI is customizable out of the box. Three tool display styles (emoji markers, grouped action labels, or minimal one-liners), three input styles (full-width block that adapts to your terminal theme, bordered lines, or plain readline), three loader animations (gradient shimmer, spinner, or trailing dots), and custom ASCII banners. You can also describe what you want directly and the skill will generate a custom style.
The generated project is yours to modify. Add domain-specific tools, wire up a different entry point (the skill includes templates for HTTP API servers), bolt on context compaction for long conversations, or strip it down to the bare minimum.
Both skills rely on the Agent SDK for a trustworthy inner loop
Both skills generate two layers of code. The inner layer is the Agent SDK: one callModel call that handles the entire agentic loop (model calls, tool execution, multi-turn cycling, stop conditions, streaming, cost tracking). The outer layer is everything the skill generates around it: configuration, tool definitions, session management, the entry point, and — in the TUI skill’s case — the terminal interface.
Here’s the generated src/agent.ts, stripped to the essentials:
(import OpenRouter from '@openrouter/agent'; ... const client = new OpenRouter({ apiKey: config.apiKey }); ... const result = client.callModel({ model: config.model, instructions: config.systemPrompt, input: userMessage, tools, stopWhen: [ stepCountIs(config.maxSteps), maxCost(config.maxCost)], });)That single callModel call is the entire agent loop. The SDK calls the model, inspects the output for tool requests, validates arguments against your Zod schemas, executes the tools, feeds results back, and repeats until a stop condition fires.
The skill wires up streaming on top of this by iterating over result.getItemsStream(). Each item is typed and carries the complete current state: message items carry the full assistant text so far, function_call items carry tool invocations, function_call_output items carry results, and reasoning items carry model thinking. The generated src/renderer.ts turns these into a clean terminal display with token counts and tool call summaries.
Tools live in src/tools/, one file per tool. Each tool uses the SDK’s tool() function with a Zod schema for input and a typed execute function. Server tools (web search, datetime) are even simpler: serverTool({ type: 'openrouter:web_search' }) and OpenRouter executes them server-side with zero client code.
Configuration flows through three layers: hardcoded defaults, an optional agent.config.json file, and environment variables. You can set your preferred model and cost limits in a config file and override them per-session with AGENT_MODEL=openai/gpt-5 npm start.
Session persistence writes every message to a JSONL file. On the next run, the harness can reload conversation history and pass it back into callModel as an Item[] array, picking up where you left off.
These patterns come from the top harnesses
The skill draws from three production agent architectures:
- pi-mono’s coding agent: three-layer separation (config, agent loop, tools), JSONL sessions, pluggable tool operations
- Claude Code: tool metadata with read-only and destructive flags, system prompt composition from static and dynamic context
- Codex CLI: layered configuration (defaults, config file, environment variables), approval flows with session caching
These patterns are baked into the generated code, but the Agent SDK is what makes the whole thing compact. Without callModel handling the agentic loop, tool validation, streaming, and cost tracking, you’d be writing hundreds of lines of loop management code yourself. The skill focuses entirely on the app-specific parts because the SDK handles everything else.
The headless skill follows the same architecture but strips away the TUI layer entirely. Instead of a REPL, the generated CLI accepts prompts via --prompt, positional arguments, or piped stdin, and outputs plain text, NDJSON event streams, or just an exit code.
Two features stand out for production use.
Safe retry on 429/5xx: the generated runAgentWithRetry wrapper retries transient API errors with exponential backoff — but only if no tool calls have executed yet. Once a mutating tool like file_write or shell has run, replaying the agent from the initial prompt would double-execute side effects, so retries throw immediately instead.
Structured output with --output-schema: pass a JSON Schema file and the CLI validates the agent’s final response against it with Ajv, exiting with code 2 on validation failure. The parser is tolerant of markdown fences, so it works even when models wrap JSON in code blocks.
For the full callModel API reference, check the SDK docs. For detailed walkthroughs, see the Build Your Own Agent TUI and Build Your Own Headless Agent guides. To build your own skills on top of the Agent SDK, start with the skills repo.
Original source - Apr 22, 2026
- Date parsed from source:Apr 22, 2026
- First seen by Releasebot:Jun 11, 2026
Introducing Workspaces
OpenRouter launches Workspaces to separate projects into isolated environments with their own API keys, routing defaults, guardrails, presets, plugins, observability, and members, while keeping billing and account-level controls shared.
Separate workspaces for projects, teams, or agents
We launched workspaces to organize your OpenRouter projects into separate environments, each with its own api keys, routing defaults, guardrails and observability.
Workspaces give you organization, flexibility, and control. Consider them if you are:
- a developer building multiple distinct projects on one OpenRouter account
- an enterprise shipping across multiple teams
- an agent with multiple environments, e.g. staging and production
Watch the Workspaces launch video
Each workspace has independent settings for:
Setting Description API keys Create and manage keys scoped to a workspace. Guardrails Set policies that govern all keys and members in the workspace, within your account-level restrictions. BYOK Bring your own provider keys per workspace or share across workspaces. Routing Optimize provider routing for cost, latency, throughput, or tool-calling quality. Presets Save shortcuts for system prompts, model configs, and request parameters. Plugins Configure different default plugin behavior for API requests in each workspace. Observability Connect different integrations per workspace, or send all traces to the same integration. Members Control which team members have access to each workspace.Your existing OpenRouter setup lives in a Default workspace. If you don’t need multiple workspaces, just keep working as usual.
To create a new workspace, navigate to your home dashboard, click the workspace picker → Create Workspace → name it and add a description. You can also create and manage workspaces programmatically using the management API.
See more detail in our docs.
Account level visibility, billing, and controls
Some of your OpenRouter settings will still matter for all of your workspaces. Think of them like global account settings. Here’s what’s shared across workspaces:
- Activity & Logs: View everything, with optional workspace filtering.
- Credits & Billing: One bill across all workspaces.
- Organization: Manage roles and workspace assignments.
- Management Keys: API keys for cross-workspace configuration.
- Privacy: Top-level restrictions apply to all workspaces and can’t be overridden at the workspace level.
Frequently Asked Questions
- What can my workspace members see about a workspace?
Within a workspace, members can create and manage their own API keys, and view other members and their roles. Members can belong to multiple workspaces. All org members automatically have access to the Default workspace. At the account level, members can view Activity and Logs.
- What can my organization admins see? What can they edit?
Org admins have admin permissions across all workspaces: they can view and manage everything in every workspace, including API keys, guardrails, BYOK, routing, presets, plugins, observability, members, and settings. Only org admins can create or delete workspaces and control members’ access to each workspace. At the account level, org admins manage billing and credits, organization membership and roles, management API keys, and account-level data policies and allowed providers/models.
- Can management keys be used across workspaces?
Yes. Management keys operate at the account level and can be used to perform administrative actions across all workspaces via the management API.
- Can workspaces have different data policies?
Workspaces inherit account-level data policies and allowed providers/models. Within those constraints, each workspace can set more granular guardrails to further restrict API key and member activity. The account-level policy is the ceiling; individual workspaces can only be more restrictive.
- What happens when I remove someone from a workspace?
When a member is removed from a workspace, they lose access to it. Before removing them, you must first delete any API keys they created in that workspace. Their access to other workspaces is unaffected. Note: all org members retain access to the Default workspace as long as they remain in the org.
- Is my chatroom/fusion usage in a workspace?
Yes. All chatroom and fusion usage is in the Default workspace.
Share feedback with us
Tell us what you’re building and any questions or feedback: we’re on X, Discord, and LinkedIn.
Original source - Apr 15, 2026
- Date parsed from source:Apr 15, 2026
- First seen by Releasebot:Jun 11, 2026
Announcing Video Generation
OpenRouter launches video generation on its unified API, adding text-to-video and image-to-video support across leading models with async jobs, normalized parameters, capability discovery, passthrough features, and a new Playground for previews.
Video generation is now live on OpenRouter
One API gives you access to the top video models. Video now sits alongside text, images, audio, embeddings, and rerankers, under the same routing, governance, and billing layer.
Watch our launch video
On day one, we’re supporting text-to-video and image-to-video on Seedance 2.0 / 1.5, Veo 3.1, Wan 2.7 / 2.6, and Sora 2 Pro, with many more to come. Browse all supported models, learn more about the feature, or jump straight to the API docs for the new /api/v1/videos.
Video APIs are fragmented. Providers use different request shapes, parameter names, and billing units. We built the API around those differences:
- Asynchronous generation: These generations take minutes, so we track them as jobs. Submit a prompt, get a job ID, and retrieve the video when ready.
- Normalized parameters: We provide one schema that works across every model, including resolution, duration, aspect ratio, audio gen, frame images, and reference images.
- Capability discovery: Programmatically determine what each model supports before you call it.
- Passthrough parameters: Use model-specific features directly when you need them.
One Unified API
It’s typical for video models to be released as a family of endpoints. Each endpoint provides a specific capability, such as text-to-video, image-to-video, or reference-to-video. We simplify this by automatically routing you to the correct endpoint based on your parameters.
Video models vary in ways that aren’t always obvious. Even duration can easily break a request. Veo 3.1 supports 4, 6, or 8-second clips, while Wan 2.6 supports 5 or 10-second clips. We also standardize how you pass in reference images (e.g., characters) and frame images (e.g., the first and last frames).
Each model can also expose its own features. Veo 3.1, for example, includes a unique personGeneration parameter that controls whether people appear in the output. The /api/v1/videos/models endpoint tells you exactly which model-specific parameters are available.
Instead of tracking all these differences yourself, you can call the video models endpoint to inspect supported resolutions, aspect ratios, pricing, input images, and durations in one place: /api/v1/videos/models. This is a perfect endpoint to give your coding agent, providing all the details it needs to adapt to each model without battling errors over acceptable input params.
For visual previews, we also added a new Playground tab on model pages, so you can try each model and see what it can make.
Multimodal Workflows
We’ve been most excited to see the results when combining video alongside other types of generations. An LLM can turn a rough idea into a detailed prompt, an image model can generate a main character, and a video model can turn that character into a scene, all through one API.
We’ve learned quickly that these models reward specificity. Camera movement, lighting, texture, pacing, motion style, all of it matters. The more detail you provide, the more control you get. That makes video generation a natural fit for LLM-generated prompts.
We built an open-source demo app that shows this multi-modal workflow in action: multimedia-explorer.openrouter.ai. All the code is on GitHub.
Tell us what you think and which models you want next in #video-feedback on Discord. Thanks to everyone who tested the alpha and helped shape the API. If you want early access to what’s next, join us on Discord.
Original source - Jun 10, 2026
- Date parsed from source:Jun 10, 2026
- First seen by Releasebot:Jun 11, 2026
Advisor: Give Any Model a Lifeline to a Smarter One
OpenRouter adds openrouter:advisor, letting any model consult a stronger model mid-generation for selective help on hard decisions, with named advisors, advisor tools, streaming, cross-request memory, and billing that stays separate from the executor.
Add openrouter:advisor to your tools array and your model can ask a stronger model for help mid-generation.
When the executor hits a hard decision, gets stuck, or wants a sanity check before finishing, it calls the advisor with a prompt. The advisor thinks, returns guidance as the tool result, and the executor keeps going with better information.
Both roles are open: any model on OpenRouter can be the executor, and any model from any provider can be the advisor. Run a Gemini executor that consults Claude, or a GPT executor that consults DeepSeek. You pick the pairing.
Try it in the chatroom or read the docs for the full API reference.
67x price gap, selective consultation
Claude Fable 5 costs $10 per million input tokens. GPT-4o Mini costs $0.15 per million. That’s a 67x spread.
Most requests don’t need frontier-level reasoning. A mid-tier model handles the bulk of a workload without issue. But the 10-20% that involves architectural decisions, ambiguous edge cases, or multi-step reasoning chains is where cheaper models stumble.
The advisor tool covers that gap selectively. Your fast model runs the show. When it hits something genuinely hard, it calls for help. You pay frontier prices only for the moments that need frontier thinking.
In an agentic coding session with 50 tool calls, maybe 2-3 are advisor consultations. The rest run at mini prices. You’ve sanded down your per-session cost while keeping the quality ceiling high.
Server-side execution, one tool call
The advisor runs server-side during generation. Your model calls it like any other tool: pass a prompt describing what it needs help with, get back the advisor’s text as the tool result. The model then writes the final answer itself, informed by the advice. The advisor is a consultant, not a ghostwriter.
Four things worth knowing:
- Any model, from any provider, can be the advisor. Pin it in the tool config with parameters.model (anything in the model catalog works), or let the executor pick per-call. Use ~anthropic/claude-fable-latest to always resolve to the newest Fable.
- The advisor gets its own tools. Give it openrouter:web_search and it’ll ground its advice in fresh sources before responding. It runs as a sub-agent with its own tool loop, then returns just the final guidance.
- Recursion is blocked. The advisor can’t call itself. A depth header and self-reference check prevent unbounded nesting, and consultations are capped per request to bound cost.
- The advisor remembers. Replay the conversation transcript in a follow-up request (with the advisor tool calls and results included) and each advisor reconstructs its prior consultations, so a follow-up question builds on what the advisor already said. Memory is per advisor (your security reviewer and your architect each keep their own thread) and works across Chat Completions, Responses, and Anthropic Messages. Full details.
Named advisors
For complex workflows, you can configure a roster of specialists. Add one openrouter:advisor entry per advisor, each with its own name, model, instructions, and tool set:
{ "tools": [ { "type": "openrouter:advisor", "parameters": { "name": "security-reviewer", "model": "anthropic/claude-fable-5", "instructions": "You are a security engineer. Find vulnerabilities." } }, { "type": "openrouter:advisor", "parameters": { "name": "architect", "model": "openai/gpt-5.5", "instructions": "You are a systems architect. Prioritize simplicity and scalability." } } ] }The executor sees a distinct tool for each advisor and calls whichever fits the task with just a prompt. An auth flow review routes to Claude Fable with the security persona; architecture questions go to GPT-5.5. Names can use letters, digits, spaces, underscores, and dashes (“Lead Architect” works), and must be unique across entries. One entry can omit name to act as the default advisor.
Advice can also stream. Set "stream": true on an advisor entry and you get the advice incrementally as the advisor writes it. In the Responses API that means response.output_text.delta events while the advice is in flight; the completed output item still carries the full text, so consumers that ignore deltas see no difference. (Chat Completions ignores the flag, and Messages-API streaming is a fast-follow.)
How this compares to other advisor tools
Some providers ship a similar advisor concept in their own APIs, but it stays inside their model family: the executor and the advisor both have to come from the same vendor, often from a fixed pairing matrix, and sometimes behind a beta gate. OpenRouter’s advisor removes those constraints and adds a few things on top:
- Any model, any provider, on both sides. Both the executor and the advisor can be any of the hundreds of models in the catalog: a cheap open-weights executor consulting a frontier model, a Gemini executor consulting Claude, or a Claude executor getting a second opinion from GPT-5.5 outside its own model family.
- A roster of named advisors. Configure multiple specialists with their own models, instructions, and tool sets in a single request, and let the executor route each question to the right one. Single-vendor versions give you one unnamed advisor.
- Advisors with their own tools. Hand an advisor openrouter:web_search and it grounds its advice in fresh sources before responding.
- Works across API formats, no beta gate. The same tool works through Chat Completions, Responses, and Anthropic Messages (with cross-request memory in all three), and it’s generally available. No beta header, no account-team access request.
If you’re already using a provider-native advisor through one of our compatible API skins, swapping to openrouter:advisor opens up the full catalog without changing the rest of your request.
Billing
Advisor tokens bill at the advisor model’s rates, separate from the executor. If your executor is GPT-4o Mini ($0.15/$0.60 per M tokens) and the advisor is Claude Fable 5 ($10/$50 per M tokens), each model’s tokens bill at their own price. Both show up on your activity page.
Get started
One line in your tools array:
{ "type": "openrouter:advisor", "parameters": { "model": "anthropic/claude-fable-5" } }The model decides when to use it. Most requests won’t trigger a consultation; the ones that do will be better for it.
Read the full docs for parameters, named advisors, sub-agent tools, and more.
Original source - Jun 1, 2026
- Date parsed from source:Jun 1, 2026
- First seen by Releasebot:Jun 10, 2026
May Release Spotlight
OpenRouter ships a major May update with Workspace Guardrails, speech and transcription APIs, Model Fusion, a rebuilt model comparison page, private models for Enterprise, and stronger workspace controls. It also adds new routing, observability, presets, logs, and 20 new models.
We closed our $113M Series B, and we’re now routing 100 trillion tokens a month. Here’s everything else that shipped in May.
Workspace Guardrails
Centralized security and governance for every request routed through your workspace. Set per-member and per-key spend limits, lock traffic to a model and provider allowlist, enforce zero data retention, block prompt injection against 30+ OWASP-derived patterns, and redact PII before it reaches a provider. Layer the rules into one guardrail, or scope them to specific API keys and members, with no code changes.
Docs · Announcement
Speech and Transcription APIs
Add voice to any application through the same API key you already use. Speech-to-text is live with Whisper, GPT-4o Mini Transcribe, and Voxtral; text-to-speech exposes supported_voices in the models API. Provider failover and upstream error passthrough are built into both.
Browse audio models · Announcement
Model Fusion
Route your prompt to multiple models in parallel and synthesize their responses into a single, higher-quality answer. Model Fusion is now available as an API plugin, a server tool, and in the chatroom composer. You get an ensemble of experts in a single call instead of relying on one model.
Try Model Fusion · Docs
Model Comparison
Compare up to five models side by side on pricing, context length, and benchmark scores. The rebuilt comparison page includes a “Highlight best” toggle, provider-coded benchmark charts for Intelligence, Coding, and Agentic metrics, and interactive slot cards to quickly add models.
Compare models
Private Models (Enterprise)
Route to your own custom, fine-tuned, or dedicated model endpoints through the standard completions and responses API. Your private models get the same guardrails, observability, and billing as any public model on the platform. Available exclusively on the Enterprise plan.
Docs
Pareto Code Router
Set min_coding_score and route to the cheapest code-capable model that clears your quality bar. Your coding agents stop overpaying for good-enough code. Configurable defaults per workspace in plugin settings.
Try it
Enterprise & Workspace Controls
A set of releases for teams running OpenRouter at scale:
- IP allowlist enforcement. API keys with an IP allowlist now actively block requests from unauthorized IPs with a 403, upgraded from observe-only mode.
Docs
- BYOK management API. Programmatically list, create, update, and delete bring-your-own-key credentials across workspaces. Keys are now grouped by priority with drag-and-drop reordering and a one-click “Test Key” for failed requests.
API docs
- Observability destinations API. CRUD endpoints for managing Datadog, Langfuse, LangSmith, and other observability integrations via management key.
API docs
Per-provider ZDR controls. Separate Zero Data Retention toggles for non-frontier, Anthropic, OpenAI, and Google providers, so you can meet compliance requirements per provider without restricting your entire model catalog.
Copy guardrails across workspaces. Standardize safety policies across all workspaces in a few clicks via the “Copy to…” menu.
Also shipped this month
- Presets API. Create or version a preset directly from an inference request body, now with Anthropic Messages and Responses skins, plus TypeScript and Python SDK support.
Docs
- Human-in-the-loop tools. A new SDK tool type that pauses execution and waits for human input before returning results, for agents that need human judgment mid-task.
Blog post
- Session-id provider stickiness. Requests sharing a session_id now route to the same provider and pin to the same concrete model across turns, improving cache hit rates for multi-turn agentic workflows.
Docs
- Auto router cost_quality_tradeoff. A 0 to 10 integer replacing the old binary toggle for finer control over cost versus quality when using the auto router.
Docs
Redesigned model pages. New model page header, step-by-step API tab with /responses and /messages endpoints, full-screen model selector, and playground side panel for inline testing.
Requests tab in logs. Full request-level drill-down alongside generation logs, with request ID filtering and time picker shorthand (15min, 1h, 3d).
Logs
Improved coding agent attribution. Cursor, GitHub Copilot, Cline, RooCode, Kilo Code, Zed, and OpenCode are now properly identified in activity logs so you can see which tools drive your usage.
Usage & Budgets on API keys. Spend charts and budget progress by guardrail layer, directly on each API key.
Rankings daily dataset. GET /api/v1/datasets/rankings-daily returns top-50 models by daily token volume for programmatic analysis.
New models
20 models launched in May, spanning text, speech, image, video, and coding:
- Anthropic Claude Opus 4.8: Anthropic’s latest Opus with mid-session system support, plus a fast variant
- Google Gemini 3.5 Flash: Google’s newest Flash model
- xAI Grok 4.3: xAI’s latest frontier model
- xAI Grok Imagine Video: Video generation from xAI
- xAI Grok Build 0.1: xAI’s code generation model
- Qwen Qwen3.7 Max: Qwen’s latest max-tier model
- Recraft V3, V4, V4 Pro: Three new image generation models
- Mistral Voxtral Mini Transcribe: Mistral’s speech-to-text model
Plus: Gemini 3.1 Flash Lite, GPT Chat Latest, CoBuddy (free), Ring-2.6-1T (free), Perceptron Mk1, and more.
Everything above is live now.
Browse the full model catalog, or tell us what’s missing on Discord.
Original source
Curated by the Releasebot team
Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.
Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.
Similar to OpenRouter with recent updates:
- Anthropic release notes614 release notes · Latest Jun 11, 2026
- OpenAI release notes743 release notes · Latest Jun 11, 2026
- Figma release notes118 release notes · Latest Jun 11, 2026
- Eleven Labs release notes65 release notes · Latest Jun 8, 2026
- Obsidian release notes90 release notes · Latest Jun 9, 2026
- Cloudflare release notes1102 release notes · Latest Jun 11, 2026