liteLLM Release Notes

Name: liteLLM
Brand: liteLLM

Follow liteLLM to add their release notes to your feed!

21 release notes curated from 22 sources by the Releasebot Team. Last updated: Jun 6, 2026

Get this feed:

Jun 4, 2026
Date parsed from source:
Jun 4, 2026

First seen by Releasebot:
Jun 6, 2026
liteLLM

v1.88.0rc3 - Claude Opus 4.8, MCP Access-Group Authorization & Typed OpenTelemetry

liteLLM releases v1.88.0rc3 with Claude Opus 4.8 support across major providers, a reworked MCP access-group system, typed OpenTelemetry spans, cheaper streaming, and new A2A discovery and LangGraph Platform mode.

v1.88.0rc3 is the current release candidate for 1.88.0.

New Models / Updated Models

• Claude Opus 4.8 is supported across Anthropic, Bedrock (including global/us/eu/au regional routes), Azure AI, and Vertex, at 1M-token context with adaptive thinking and output_config goal mode.
• MCP access-group authorization was reworked end to end: key and team access groups now resolve to MCP servers, grants are additive with opt-in member assignment, and clients can route through stateful or stateless sessions by session id.
• Typed OpenTelemetry instrumentation lands a semconv-aligned span model that carries team_metadata, http.route, and model names on inference spans.
• Streaming is ~30% cheaper per chunk on the Anthropic and Bedrock hot path.
• Agent-to-agent (A2A) gains well-known agent-card discovery and a LangGraph Platform mode.

New Model Support (Claude Opus 4.8 across 9 provider routes) includes Anthropic claude-opus-4-8, Vertex AI vertex_ai/claude-opus-4-8, Azure AI azure_ai/claude-opus-4-8, Bedrock anthropic.claude-opus-4-8 (+ global./us./eu./au. routes) with context windows up to 1,000,000 tokens (200,000 for Azure AI), input $5.00/1M tokens, output $25.00/1M tokens, and features like vision, function calling, prompt caching, reasoning (adaptive + max/xhigh effort), PDF input, computer use, response schema, tool choice, output_config, and native structured output for Bedrock.

Additional updates include reasoning-effort flag cleanup across existing Claude catalog entries, removal of unsupported supports_minimal_reasoning_effort, normalization of supports_max_reasoning_effort, and a new bedrock_output_config_effort_ceiling (high/xhigh/max) field on Bedrock entries (PR #29238).

Features

• Anthropic: Add Claude Opus 4.8 and prune stale reasoning-effort flags (PR #29238).
• Bedrock: Claude Code goal mode via output_config for Bedrock Opus (PR #28898), support tool search results and chat annotations (PR #29120).

Bug Fixes

• Anthropic: Stop injecting unsupported output_config.effort=xhigh for Claude Code on Sonnet/Opus 4.6 (PR #29304).
• Vertex AI: Strip output_config.effort for Vertex Claude models that reject it (Haiku 4.5) (PR #29585).
• Bedrock: Align toolUse/toolSpec names and allow hyphens (PR #28874).
• Azure: Preserve AD token refresh in the v1 OpenAI client path (PR #28627).
• OpenAI: Fix the double provider-prefix bug on model names (PR #28661).
• General: Hydrate wildcard model-discovery credentials (PR #28284).

LLM API Endpoints Features

• Realtime API: Tool calling for the Gemini and Vertex AI live API (PR #26590).
• A2A: Well-known agent-card discovery and LangGraph Platform mode (PR #28860).
• Context Management: compact_20260112 polyfill so non-Anthropic providers get context compaction (PR #28868).
• Video: Vertex Veo video edit, using DB credentials in the video handlers (PR #29098).
• Pass-through: Extend passthrough_managed_object_ids to Azure (PR #29160).

Bugs

• Realtime API: Send TEXT frames and a valid guardrail session.update (PR #28848).
• Moderations: Wire streaming flags through to the unified dispatcher (PR #27324).
• Batches: Strip LiteLLM policy tracking from OpenAI batch metadata (PR #28425), map the stripped batch body.model back to the proxy alias for auth (PR #29264).
• Vector Stores: Restrict vector store index create/delete to proxy admins (PR #29202).
• Video: Resolve managed video model ids for auth (PR #29545).
• Pass-through: Bedrock Knowledge Base pass-through: preserve SigV4 headers and the signed request body (PR #27526), enforce allowed_passthrough_routes for auth=true pass-through (PR #29256), de-duplicate pass-through endpoint logs (PR #29598), match pass-through registry routes bare-to-bare when SERVER_ROOT_PATH is set, fixing pass-through 404s (PR #29658).

Management Endpoints / UI Features

• Virtual Keys & Teams: Expose keys_count on /v2/team/list and wire the UI Resources badge (PR #28502), allow team members to create keys on org-scoped teams (PR #29310), exempt UI and CLI session tokens from team-key budget ceilings, hardened so custom default_key_generate_params cannot re-impose them (PR #29612, PR #29639), record ownership for service-account keys, plus a Prisma JSON serialization fix (PR #28990).
• Deployment: Helm: split per-component ServiceAccounts for gateway, backend, and UI (PR #28712), Enterprise: RESEND_FROM_EMAIL for self-hosted Resend sends (PR #28830).

Management Endpoints / UI Bugs

• Virtual Keys & Teams: Refresh the team cache on team_model_add/team_model_delete (PR #28683), keep the team_alias cache in sync on _cache_team_object writes (PR #28737), fix spend-logs v2 route permissions (PR #28705), normalize the Bearer prefix in the safe-hash helper (PR #29343).
• UI: Allow clearing custom pricing on wildcard models (PR #28719), stop vertex_ai-anthropic_models from leaking into the Anthropic dropdown (PR #28723), route API Reference back to the query-param page (PR #28726), show 2-decimal precision for max_budget on the key overview (PR #28809), break the logout redirect loop across dev and proxy origins (PR #29360), internal refactors: extract auth state into AuthContext, remove dead App Router scaffolding (PR #28910, PR #28891).

AI Integrations Logging

• DataDog: Drain the cost-management queue and add an opt-in FinOps tag allowlist (PR #28487).
• Galileo: Support the hosted v2 spans API and string output extraction (PR #28771).
• OpenTelemetry: Typed, semconv-aligned instrumentation (PR #28909), add team_metadata, http.route, and model names to inference spans (PR #29319), export the SERVER span on management-endpoint success without an http_request (PR #28794), link pass-through success spans to the SERVER root span (PR #29315).
• General: Exclude proxy_server_request from its own body snapshot (PR #28618), fix duplicate Claude Code traces (PR #29311).

Guardrails

• General: Return HTTP 400 for LiteLLM content-filter blocks (PR #28418), wire apply_guardrail into proxy logging callbacks (PR #28970), persist disable_global_guardrails on keys (PR #29233).

Spend Tracking, Budgets and Rate Limiting

• Cost Tracking — OpenAI regional-processing cost uplift for EU/US data residency (PR #28626).
• Rate Limiting — Cap the no-max_tokens TPM floor at the smallest configured limit (v3 limiter) (PR #28805).
• Budgets — Enforce tag budgets for key-level tags (PR #29108), enforce deployment budgets for dynamically added models (PR #29273), reset_budget writes only {spend, budget_reset_at} and stops pre-zeroing the counter (PR #29358).

MCP Gateway

• Session Routing — Stateless and stateful clients via session-id routing (PR #26857).
• Access Groups — Additive key access-group grants with opt-in member assignment (PR #29313), resolve team access_group_ids to MCP servers (PR #28997), resolve key access_group_ids to MCP servers (ungated) (PR #29195), extend the key access-group union to MCP servers (PR #28890).
• Discovery — Allow llm_api_routes virtual keys to list MCP servers (PR #28442).
• Server CRUD — Preserve source_url on GET /v1/mcp/server list responses (PR #29249), preserve omitted fields on PUT /v1/mcp/server partial updates (PR #29253).
• Virtual Keys — Ignore stale ids on key save (PR #29128).

Performance / Loadbalancing / Reliability improvements

• Streaming hot path — ~30% lower per-chunk overhead on the Anthropic and Bedrock streaming path (PR #28720).
• Docker — Use system Node in the componentized builders and retry apk add (PR #28888).
• Dependencies — Routine dependency bumps, including a Starlette bad-host fix (PR #29208, PR #29373).

Documentation Updates

• Hand-written CLAUDE.md; remove AGENTS.md and point GEMINI.md at it (PR #29252).
• Agent guidance: require consent before writing new third-party names (PR #28908).
• Cookbook: bump the Go directive to 1.26.3 in the gollem example (PR #29234).

General Proxy Improvements

Testing, CI & build hardening:
• UI e2e coverage across roles and flows — Team-BYOK add-model, Router fallback, MCP add-server, AI Hub make-public, Team Admin, Internal User / Viewer, logout and navbar identity (PR #29068, #29069, #29070, #29071, #29072, #29074, #29075, #29076, #29077, #29080, #29083, #28652).
• Pass-through SERVER_ROOT_PATH login-redirect trailing-slash e2e (PR #29369).
• Behavior-pinning harnesses for proxy_server.py (PR #28827, #29309).
• Deterministic Redis cassette replay and live Google OAuth token minting for VCR (PR #28826, #29229).
• Reasoning-effort grid test covering Claude Opus 4.8 across provider routes (PR #29327).
• Bedrock CI account moves and restore (PR #28728, #29326, #29245).
• Keep litellm_internal_staging green (PR #29344).
• Regenerate the admin-ui static export with trailingSlash: true (PR #28112).

PR roll-up by ownership area (total: 97):
• Other (CI / tests / build hardening): 23
• UI / Auth & Management: 18
• LLM API Endpoints: 15
• MCP: 9
• Models & Providers: 9
• Logging: 8
• Spend / Budgets / Rate Limits: 5
• Performance: 4
• Documentation: 3
• Guardrails: 3

Release candidate changelog (rc.1 → rc.2 → rc.3)

Almost everything above shipped in rc.1. The later candidates are small, targeted patches cut by cherry-pick.

rc.2 added six fixes:
• Resolve managed video model ids for auth (PR #29545).
• Allow team members to create keys on org-scoped teams (PR #29310).
• Strip output_config.effort for Vertex Claude Haiku 4.5 (PR #29585).
• De-duplicate pass-through endpoint logs (PR #29598).
• Exempt UI/CLI session tokens from team-key budget ceilings (PR #29612).
• Harden that exemption against custom default_key_generate_params (PR #29639).
rc.3 added one fix:
• Match pass-through registry routes bare-to-bare when SERVER_ROOT_PATH is set, fixing pass-through 404s (PR #29658).

New Contributors

No new contributors this release; all 11 authors are returning contributors.

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.87.0-rc.1...v1.88.0-rc.3

06/04/2026 (v1.88.0rc3)

• New Models / Updated Models: 9
• LLM API Endpoints: 15
• Management Endpoints / UI: 18
• AI Integrations (Logging / Guardrails): 11
• Spend Tracking, Budgets and Rate Limiting: 5
• MCP Gateway: 9
• Performance / Loadbalancing / Reliability improvements: 4
• General Proxy Improvements (testing / CI / build): 23
• Documentation Updates: 3
Total: 97 PRs
Original source
May 23, 2026
Date parsed from source:
May 23, 2026

First seen by Releasebot:
Jun 2, 2026
liteLLM

v1.87.0 - OCI Generative AI Provider, Gemini 3.5 Flash Day-0, MCP UI for OAuth Servers

liteLLM adds OCI Generative AI as a first-class provider and expands Gemini day-0 support, while improving MCP OAuth tooling, Codex CLI auth, and Anthropic streaming performance. The release also brings new models, better logging and guardrails, and broader proxy reliability.
Key Highlights

OCI Generative AI as a first-class provider — production-ready chat, embeddings, streaming, reasoning and tool use across Cohere Command-A, Meta Llama 3.1/3.2/3.3/4, xAI Grok 3/4, Google Gemini 2.5, and OpenAI GPT-5 hosted on OCI; full model-pricing catalog included.

Gemini 3.5 Flash Day-0 support — gemini-3.5-flash and gemini-3.1-flash-lite ship on Vertex AI, Google AI Studio, and OpenRouter with full pricing, function calling, web search, code execution, and managed-agents support.

MCP UI for OAuth tool calls — the dashboard now resolves tool list and tool call against OAuth-protected MCP servers directly, plus native MCP OAuth support for Cursor and clearer OAuth error messages.

Codex CLI auth hardening — JWT-derived team aliases and SSO form-URL flow for the OpenAI Codex CLI, plus allowlisted OIDC-claim persistence across the CLI SSO poll.

Anthropic streaming hot-path perf — ~90% lower TTFT overhead and higher sustained throughput on the proxy's Anthropic /v1/messages SSE path, measured on a real 4-pod deployment against both Anthropic and Bedrock Invoke (wire output is parity-tested); plus lazy-loaded response streaming for Bedrock SageMaker.

New Providers and Endpoints

New Providers (1 new provider)

Provider: OCI Generative AI
Supported LiteLLM Endpoints: /v1/chat/completions, /v1/embeddings
Description: Official Oracle Cloud Infrastructure Generative AI integration. Production-ready support for chat, streaming, reasoning, tool calling, and embeddings across Cohere Command-A (incl. Reasoning + Vision), Meta Llama 3.1 / 3.2 / 3.3 / 4, xAI Grok 3 / 4, Google Gemini 2.5, and OpenAI GPT-5. Includes full model-pricing catalog. - PR #28223

New Models / Updated Models

New Model Support (22 new models) including Gemini gemini-3.5-flash and gemini-3.1-flash-lite with extensive features such as audio input, function calling, PDF input, vision, web search, and more.

Features

Gemini

Day-0 support for gemini-3.5-flash - PR #28268

Add gemini-3.1-flash-lite model cost map - PR #28320

Additional gemini-3.1-flash-lite pricing entry - PR #27933

Gemini managed-agents support - PR #28270

Azure

Add Azure Speech STT config support - PR #27482

OpenRouter

Add Xiaomi MiMo-V2.5 and MiMo-V2.5-Pro model entries - PR #27700

Add openrouter/google/gemini-3.1-flash-lite pricing entry - PR #28280

Bug Fixes

Vertex AI

Omit function_call.id on Vertex Gemini 3.5+ tool turns (the field is rejected by the new schema) - PR #28324

vertex_gemma: strip context_management from the request body - PR #28438

Bedrock

bedrock/cohere: send embedding_types as a JSON array, not a string - PR #28172

Sanitize batch metadata to prevent Pydantic ValidationError - PR #28202

Decouple STS region from Bedrock aws_region_name - PR #28245

SageMaker

Send the native Cohere embed payload to Cohere SageMaker endpoints - PR #28613

DeepSeek

Use the native /anthropic/v1/messages endpoint and sanitize tools - PR #28200

Azure

Decouple Azure OpenAI deployment ID from model name via base_model so GPT-5 model routing works on custom deployment names - PR #28490

Router: use the forwarded model_id for native Azure container IDs - PR #27921

vLLM

Fix Anthropic tool-call transformation on vLLM deployments - PR #28549

LLM API Endpoints

Interactions API

Migrate to the Google Interactions API steps schema (May 2026 revision) - PR #28153

Google-native passthrough

Decode bytes and pass through SSE for Google-native streamGenerateContent (no more b'...' literals on the wire) - PR #28213

Responses API

Forward timeout on the completion-transformation path for Anthropic, Bedrock, and Vertex - PR #28133

Accept dict-shape reasoning_effort from the Anthropic Responses bridge - PR #28201

Wrap aresponses streaming iterator for mid-stream router fallbacks - PR #28215

Unblock staging — mypy + coverage for aresponses streaming fallback - PR #28318

Strip Anthropic cache_control from OpenAI Responses API requests - PR #28431

Use the OpenAI SSEDecoder for Responses API streaming - PR #28566

Replay openai/responses bridge cache hits as chat streams - PR #28158

Interactions API

Never drop streamed text deltas; always emit the terminal completion - PR #28394

Batch API

Normalize batch file IDs before the ManagedObjectTable write - PR #28339

Management Endpoints / UI

Models + Endpoints

Add a pause/resume Switch on the models table - PR #28151

Spend Logs

Consolidate filter state and extract components in the UI - PR #25847

Playground

Interactions API endpoint in the Playground with SSE streaming - PR #28156

Passthrough Routes

Team passthrough routes — create parity + edit-load fix - PR #28098

Gate team.allowed_passthrough_routes writes to proxy admins - PR #28097

Auth / Codex CLI

Codex CLI JWT team alias propagation - PR #28621

Codex CLI SSO form-URL flow - PR #28271

Persist allowlisted OIDC claims in the CLI SSO poll - PR #28463

Virtual Keys

Encrypt callback_vars in key/team metadata at rest in the DB - PR #27141

AI Integrations

Logging

Prometheus

Emit per-token-type detail metrics — five sparse counters that break out usage.prompt_tokens_details / usage.completion_tokens_details fields providers already report (LIT-3220) - PR #28372

Add user_email and user_alias labels to user budget metrics - PR #28155

OpenTelemetry

Propagate team_id and team_alias to all child OTEL spans - PR #28273

Emit a guardrail span on violations and surface status + categories - PR #28364

Serialize guardrail_response to JSON in OTEL traces - PR #28362

Stamp http.response.status_code on all error responses - PR #28405

Guardrails

Microsoft Purview DLP

New guardrail integration for Microsoft Purview DLP - PR #24966

Spend Tracking, Budgets and Rate Limiting

Spend Counter — Seed the Redis counter via SET NX to prevent cross-pod double-seed on cold start - PR #27854

Cost Tracking — Recalculate cost after router retry failures so the logged cost reflects the actual attempt that succeeded - PR #28476

Cost Tracking — Treat litellm_provider=None as a wildcard in _check_provider_match so cost lookup works for catalog entries that omit the provider field - PR #28523

MCP Gateway

OAuth in the UI — Add tool-call and tool-list support via the dashboard for OAuth-protected MCP servers - PR #28454

Cursor OAuth — Allow native MCP OAuth support for Cursor - PR #28327

Auth Resolution — JWT on tools/list and REST tools/call server resolution - PR #28227

Cold-Start Init — Forward upstream initialize instructions on cold gateway init - PR #28231

OAuth Errors — Add error_description and hint to OAuth flow error responses - PR #28471

Inspector — Trim whitespace from MCP inspector tool-call inputs - PR #28203

Performance / Loadbalancing / Reliability improvements

Anthropic /v1/messages streaming hot path — cut per-request and per-chunk overhead on the proxy's Anthropic streaming path, with byte-identical wire output guaranteed by parity tests that diff the logged and billed payloads between the fast and legacy paths. Measured on a real 4-pod m7i.xlarge deployment (no HPA) streaming 256 text_delta chunks per request, against both Anthropic and Bedrock Invoke — TTFT overhead ~90% lower with higher sustained throughput (full numbers below) - PR #28289

Skip work that's a no-op in the default config: the per-chunk Datadog span when tracing is off, the per-chunk streaming hook when no callback / guardrail / cost-injection is active, and the agentic post-processing wrapper when no callback overrides its hook (it otherwise buffers every chunk and rebuilds the response from SSE just to call hooks that all return (False, {})).

Stop doing the same work twice per request: serialize the request body once and reuse it for the pre-call log and the wire, memoize the optional-params type-hint resolution (~80µs/request), and skip the redundant strip_empty_text_blocks scan when the async wrapper already sanitized.

Cheaper end-of-stream reconstruction: collapse the homogeneous run of content_block_delta text events into a single equivalent SSE event before stream_chunk_builder, removing O(output-token) ModelResponseStream constructions; tool-use / thinking / citations streams fall back to the unchanged legacy path.

Cheaper hot-path logging: gate debug f-string evaluation behind isEnabledFor(DEBUG), hoist cost_injection_active out of the per-chunk loop, and drop one async-generator layer per chunk in async_sse_data_generator.

Bedrock / SageMaker — Switch to lazy loading for response streaming - PR #28189

Granian ASGI — Add Granian as a supported ASGI server for better throughput stability - PR #26027

Prisma — Expose Prisma idle/connect timeout + extra DB URL params so production deployments can tune connection pools - PR #28395

Proxy auth — Strict media-type match for form bodies (defensive against ambiguous Content-Type) - PR #27939

Proxy auth — Carry the ASGI path into the WebSocket auth synthetic Request so auth resolves the right route - PR #27940

Docker — Restore npm to the non-root builder image so UI builds run there - PR #28519

Helm — Drop the main- prefix from the default image tag - PR #28710

License check — Read PEP 639 license-expression metadata in check_licenses - PR #28529

Documentation Updates

Fix the incorrect /v1/agents request example - PR #28131

Fix misleading credential-passing examples in Gemini-agents GET/DELETE docstrings - PR #28293

General Proxy Improvements

Testing, CI & build hardening:

Behavior-pinning harness + Key Tier-1 matrix (and tier-2/3 + team management endpoints + phase-4 payload matrix) - PR #28321, PR #28441, PR #28620, PR #28681

Stabilize image-edit VCR cassettes to stop live gpt-image-1 spend - PR #28110

Migrate realtime + rerank tests off shut-down upstream models; replace gpt-4o-audio-preview with gpt-audio-1.5; expect session.created as xAI realtime initial event - PR #28191, PR #28281, PR #28424

Harden the flaky proxy callback-leak detector - PR #28195

E2E runner migrated to uv; add an "All Proxy Models" key test - PR #28313

UI-e2e: admin key creation with a specific proxy model; forward LITELLM_LICENSE to the UI e2e proxy - PR #28365, PR #28398

Vertex AI grounding test tolerates transient 500; streaming test tolerates Vertex 429 wrapped in MidStreamFallbackError - PR #28503, PR #28669

Bump black to 26.3.1 and reapply formatting; one-shot lint fix - PR #28525, PR #28639

Allow audio_transcription_config in the model-prices schema - PR #28708

Remove the dead old Playwright e2e suite - PR #28632

Routine dependency/CI bumps - PR #28287, PR #28524, PR #28528, PR #27665, PR #28296, PR #28303, PR #28707

PR roll-up by ownership area

PRs by ownership area (total: 93)

Other (CI / tests / build hardening): 25

Models & Providers (incl. new provider): 18

UI / Auth & Management: 12

LLM API Endpoints: 11

Performance: 9

Logging: 6

MCP: 6

Spend / Budgets / Rate Limits: 3

Docs: 2

Guardrails: 1

New Contributors

@IshaMeera made their first contribution in #28131

@TorvaldUtne made their first contribution in #27700

@adityasingh2400 made their first contribution in #28523

@cwang-otto made their first contribution in #28133

@ro31337 made their first contribution in #28280

@withomasmicrosoft made their first contribution in #28490

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.86.0...v1.87.0
Original source
All of your release notes in one feed

Join Releasebot and get updates from liteLLM and hundreds of other software products.

Create account
Get updates with:
May 23, 2026
Date parsed from source:
May 23, 2026

First seen by Releasebot:
May 31, 2026
liteLLM

v1.87.0rc1 - OCI Generative AI Provider, Gemini 3.5 Flash Day-0, MCP UI for OAuth Servers

liteLLM adds OCI Generative AI as a first-class provider, ships day-0 Gemini 3.5 Flash support, and expands MCP OAuth, Codex CLI auth, and performance on Anthropic streaming. It also brings new guardrails, logging, budgets, and broader model and endpoint support.
Key Highlights

OCI Generative AI as a first-class provider — production-ready chat, embeddings, streaming, reasoning and tool use across Cohere Command-A, Meta Llama 3.1/3.2/3.3/4, xAI Grok 3/4, Google Gemini 2.5, and OpenAI GPT-5 hosted on OCI; full model-pricing catalog included.

Gemini 3.5 Flash Day-0 support — gemini-3.5-flash and gemini-3.1-flash-lite ship on Vertex AI, Google AI Studio, and OpenRouter with full pricing, function calling, web search, code execution, and managed-agents support.

MCP UI for OAuth tool calls — the dashboard now resolves tool list and tool call against OAuth-protected MCP servers directly, plus native MCP OAuth support for Cursor and clearer OAuth error messages.

Codex CLI auth hardening — JWT-derived team aliases and SSO form-URL flow for the OpenAI Codex CLI, plus allowlisted OIDC-claim persistence across the CLI SSO poll.

Anthropic streaming hot-path perf — ~90% lower TTFT overhead and higher sustained throughput on the proxy's Anthropic /v1/messages SSE path, measured on a real 4-pod deployment against both Anthropic and Bedrock Invoke (wire output is parity-tested); plus lazy-loaded response streaming for Bedrock SageMaker.

New Providers and Endpoints

New Providers (1 new provider)

Provider: OCI Generative AI
Supported LiteLLM Endpoints: /v1/chat/completions, /v1/embeddings
Description: Official Oracle Cloud Infrastructure Generative AI integration. Production-ready support for chat, streaming, reasoning, tool calling, and embeddings across Cohere Command-A (incl. Reasoning + Vision), Meta Llama 3.1 / 3.2 / 3.3 / 4, xAI Grok 3 / 4, Google Gemini 2.5, and OpenAI GPT-5. Includes full model-pricing catalog. - PR #28223

Features

Gemini

Day-0 support for gemini-3.5-flash - PR #28268

Add gemini-3.1-flash-lite model cost map - PR #28320

Additional gemini-3.1-flash-lite pricing entry - PR #27933

Gemini managed-agents support - PR #28270

Azure

Add Azure Speech STT config support - PR #27482

OpenRouter

Add Xiaomi MiMo-V2.5 and MiMo-V2.5-Pro model entries - PR #27700

Add openrouter/google/gemini-3.1-flash-lite pricing entry - PR #28280

Bug Fixes

Vertex AI

Omit function_call.id on Vertex Gemini 3.5+ tool turns (the field is rejected by the new schema) - PR #28324

vertex_gemma: strip context_management from the request body - PR #28438

Bedrock

bedrock/cohere: send embedding_types as a JSON array, not a string - PR #28172

Sanitize batch metadata to prevent Pydantic ValidationError - PR #28202

Decouple STS region from Bedrock aws_region_name - PR #28245

SageMaker

Send the native Cohere embed payload to Cohere SageMaker endpoints - PR #28613

DeepSeek

Use the native /anthropic/v1/messages endpoint and sanitize tools - PR #28200

Azure

Decouple Azure OpenAI deployment ID from model name via base_model so GPT-5 model routing works on custom deployment names - PR #28490

Router: use the forwarded model_id for native Azure container IDs - PR #27921

vLLM

Fix Anthropic tool-call transformation on vLLM deployments - PR #28549

LLM API Endpoints

Interactions API

Migrate to the Google Interactions API steps schema (May 2026 revision) - PR #28153

Google-native passthrough

Decode bytes and pass through SSE for Google-native streamGenerateContent (no more b'...' literals on the wire) - PR #28213

Bugs

Responses API

Forward timeout on the completion-transformation path for Anthropic, Bedrock, and Vertex - PR #28133

Accept dict-shape reasoning_effort from the Anthropic Responses bridge - PR #28201

Wrap aresponses streaming iterator for mid-stream router fallbacks - PR #28215

Unblock staging — mypy + coverage for aresponses streaming fallback - PR #28318

Strip Anthropic cache_control from OpenAI Responses API requests - PR #28431

Use the OpenAI SSEDecoder for Responses API streaming - PR #28566

Replay openai/responses bridge cache hits as chat streams - PR #28158

Interactions API

Never drop streamed text deltas; always emit the terminal completion - PR #28394

Batch API

Normalize batch file IDs before the ManagedObjectTable write - PR #28339

Management Endpoints / UI

Features

Models + Endpoints

Add a pause/resume Switch on the models table - PR #28151

Spend Logs

Consolidate filter state and extract components in the UI - PR #25847

Playground

Interactions API endpoint in the Playground with SSE streaming - PR #28156

Passthrough Routes

Team passthrough routes — create parity + edit-load fix - PR #28098

Gate team.allowed_passthrough_routes writes to proxy admins - PR #28097

Auth / Codex CLI

Codex CLI JWT team alias propagation - PR #28621

Codex CLI SSO form-URL flow - PR #28271

Persist allowlisted OIDC claims in the CLI SSO poll - PR #28463

Virtual Keys

Encrypt callback_vars in key/team metadata at rest in the DB - PR #27141

Bugs

Auth / Discovery

Hydrate wildcard discovery credentials so OIDC discovery works against wildcarded providers - PR #28284

Spend Logs

Restore the log-filter loading indicator - PR #28282

End-User Logs

Fix end-user logs surfacing - PR #27758

AI Integrations

Logging

Prometheus

Emit per-token-type detail metrics — five sparse counters that break out usage.prompt_tokens_details / usage.completion_tokens_details fields providers already report (LIT-3220) - PR #28372

Add user_email and user_alias labels to user budget metrics - PR #28155

OpenTelemetry

Propagate team_id and team_alias to all child OTEL spans - PR #28273

Emit a guardrail span on violations and surface status + categories - PR #28364

Serialize guardrail_response to JSON in OTEL traces - PR #28362

Stamp http.response.status_code on all error responses - PR #28405

Guardrails

Microsoft Purview DLP

New guardrail integration for Microsoft Purview DLP - PR #24966

Spend Tracking, Budgets and Rate Limiting

Spend Counter — Seed the Redis counter via SET NX to prevent cross-pod double-seed on cold start - PR #27854

Cost Tracking — Recalculate cost after router retry failures so the logged cost reflects the actual attempt that succeeded - PR #28476

Cost Tracking — Treat litellm_provider=None as a wildcard in _check_provider_match so cost lookup works for catalog entries that omit the provider field - PR #28523

MCP Gateway

OAuth in the UI — Add tool-call and tool-list support via the dashboard for OAuth-protected MCP servers - PR #28454

Cursor OAuth — Allow native MCP OAuth support for Cursor - PR #28327

Auth Resolution — JWT on tools/list and REST tools/call server resolution - PR #28227

Cold-Start Init — Forward upstream initialize instructions on cold gateway init - PR #28231

OAuth Errors — Add error_description and hint to OAuth flow error responses - PR #28471

Inspector — Trim whitespace from MCP inspector tool-call inputs - PR #28203

Performance / Loadbalancing / Reliability improvements

Anthropic /v1/messages streaming hot path — cut per-request and per-chunk overhead on the proxy's Anthropic streaming path, with byte-identical wire output guaranteed by parity tests that diff the logged and billed payloads between the fast and legacy paths. Measured on a real 4-pod m7i.xlarge deployment (no HPA) streaming 256 text_delta chunks per request, against both Anthropic and Bedrock Invoke — TTFT overhead ~90% lower with higher sustained throughput (full numbers below) - PR #28289

Bedrock / SageMaker — Switch to lazy loading for response streaming - PR #28189

Granian ASGI — Add Granian as a supported ASGI server for better throughput stability - PR #26027

Prisma — Expose Prisma idle/connect timeout + extra DB URL params so production deployments can tune connection pools - PR #28395

Proxy auth — Strict media-type match for form bodies (defensive against ambiguous Content-Type) - PR #27939

Proxy auth — Carry the ASGI path into the WebSocket auth synthetic Request so auth resolves the right route - PR #27940

Docker — Restore npm to the non-root builder image so UI builds run there - PR #28519

Helm — Drop the main- prefix from the default image tag - PR #28710

License check — Read PEP 639 license-expression metadata in check_licenses - PR #28529

Documentation Updates

Fix the incorrect /v1/agents request example - PR #28131

Fix misleading credential-passing examples in Gemini-agents GET/DELETE docstrings - PR #28293

General Proxy Improvements

Testing, CI & build hardening:

Behavior-pinning harness + Key Tier-1 matrix (and tier-2/3 + team management endpoints + phase-4 payload matrix) - PR #28321, PR #28441, PR #28620, PR #28681

Stabilize image-edit VCR cassettes to stop live gpt-image-1 spend - PR #28110

Migrate realtime + rerank tests off shut-down upstream models; replace gpt-4o-audio-preview with gpt-audio-1.5; expect session.created as xAI realtime initial event - PR #28191, PR #28281, PR #28424

Harden the flaky proxy callback-leak detector - PR #28195

E2E runner migrated to uv; add an "All Proxy Models" key test - PR #28313

UI-e2e: admin key creation with a specific proxy model; forward LITELLM_LICENSE to the UI e2e proxy - PR #28365, PR #28398

Vertex AI grounding test tolerates transient 500; streaming test tolerates Vertex 429 wrapped in MidStreamFallbackError - PR #28503, PR #28669

Bump black to 26.3.1 and reapply formatting; one-shot lint fix - PR #28525, PR #28639

Allow audio_transcription_config in the model-prices schema - PR #28708

Remove the dead old Playwright e2e suite - PR #28632

Routine dependency/CI bumps - PR #28287, PR #28524, PR #28528, PR #27665, PR #28296, PR #28303, PR #28707

PR roll-up by ownership area

PRs by ownership area (total: 93)

Other (CI / tests / build hardening): 25

Models & Providers (incl. new provider): 18

UI / Auth & Management: 12

LLM API Endpoints: 11

Performance: 9

Logging: 6

MCP: 6

Spend / Budgets / Rate Limits: 3

Docs: 2

Guardrails: 1

New Contributors

@IshaMeera made their first contribution in #28131

@TorvaldUtne made their first contribution in #27700

@adityasingh2400 made their first contribution in #28523

@cwang-otto made their first contribution in #28133

@ro31337 made their first contribution in #28280

@withomasmicrosoft made their first contribution in #28490

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.86.0-rc.1...v1.87.0-rc.1
Original source
May 16, 2026
Date parsed from source:
May 16, 2026

First seen by Releasebot:
May 31, 2026
liteLLM

v1.86.0 - Weighted-Routing Failover, Native Web-Search Citations & OTel-Standard Tracing

liteLLM ships weighted-routing failover, native Anthropic web-search citations, richer OpenTelemetry server spans, and a componentized gateway-backend-ui deployment. It also adds new Azure AI GPT-5.4 models and fixes a critical rate-limit regression.
Key Highlights

Weighted-Routing Failover — on a deployment failure, the router now retries the same model group on a different deployment (e.g. another Azure region) while the initial pick still respects configured weights, behind a router-level flag.

Native web-search citations for Anthropic clients — LiteLLM now emits native web_search_tool_result blocks so Claude Desktop / Cowork render web-search citations correctly.

OTel-standard server-span attributes — the proxy SERVER span now carries http.response.status_code, http.route, url.path, and litellm.preprocessing.duration_ms, plus an opt-in for the experimental OTEL GenAI semantic conventions.

Componentized deployment — additive scaffold + Helm chart to split the monolithic proxy into independently scalable gateway, backend, and ui services.

Critical rate-limit regression fixed — the v3 limiter was leaking internal reservation keys into the upstream provider body, breaking every virtual key with a tpm_limit / rpm_limit set.

Claude Code compatibility coverage

We expanded the set of Claude Code features that LiteLLM automatically tests against daily, and added a Known Issues section to the Claude Code compatibility doc so customers can see which combinations are red, and why, before hitting them in production.

This is a direct response to customer feedback on stability and regressions. The matrix is backed by a rigorous end-to-end suite that hits real provider endpoints with no mocking. The suite re-runs every day and the doc renders the latest LiteLLM stable against the latest Claude Code version.

Today's coverage sits at 76% across Anthropic, Bedrock Invoke, Bedrock Converse, Vertex AI, and Azure Foundry. Over the next week we plan to bring this to 90%. Coming soon, the same suite will gate PRs: any cell flipping green to red will fail the check and block merges into staging, making it much harder for code that breaks Claude Code to land in the next release.

New Models / Updated Models

New Model Support

Provider: Bedrock
Model: jp.anthropic.claude-sonnet-4-6
Context Window: 1,000,000
Input ($/1M tokens): $3.30
Output ($/1M tokens): $16.50
Features: Prompt caching, reasoning, vision, function calling, PDF input, computer use

Provider: Azure AI
Model: azure_ai/gpt-5.4
Context Window: 1,050,000
Input ($/1M tokens): $2.50
Output ($/1M tokens): $15.00
Features: Reasoning, vision, web search, function calling, prompt caching, service tier

Provider: Azure AI
Model: azure_ai/gpt-5.4-pro
Context Window: 1,050,000
Input ($/1M tokens): $30.00
Output ($/1M tokens): $180.00
Features: Responses-mode, reasoning, vision, web search, prompt caching

Provider: Azure AI
Model: azure_ai/gpt-5.4-mini
Context Window: 400,000
Input ($/1M tokens): $0.75
Output ($/1M tokens): $4.50
Features: Reasoning, vision, web search, function calling, prompt caching

Provider: Azure AI
Model: azure_ai/gpt-5.4-nano
Context Window: 400,000
Input ($/1M tokens): $0.20
Output ($/1M tokens): $1.25
Features: Reasoning, vision, web search, function calling, prompt caching

Each Azure AI GPT-5.4 model also ships a dated snapshot alias (gpt-5.4-2026-03-05, gpt-5.4-pro-2026-03-05, gpt-5.4-mini-2026-03-17, gpt-5.4-nano-2026-03-17) — 9 catalog entries total. All GPT-5.4 entries include tiered (>272k) and priority pricing.

Features

Azure AI

Add Azure AI Foundry GPT-5.4 model metadata (gpt-5.4 / pro / mini / nano + dated aliases) - PR #28030

Bedrock

Add jp.cross-region inference profile for claude-sonnet-4-6 - PR #27976

Bug Fixes

Bedrock

bedrock-mantle: use /anthropic/v1/messages path for Mantle (Claude Mythos Preview) endpoint — /v1/messages was 404ing every Mantle request - PR #27976

LLM API Endpoints

Features

Anthropic Messages API (/v1/messages)

Emit native web_search_tool_result blocks for Anthropic clients (Claude Desktop / Cowork citations) - PR #27886

Vector Stores

Fix vector store retrieve/list/update/delete when no completion model is set; merge URL query params into request data on those routes - PR #27929

Bugs

Batch API

Managed batches: convert raw provider output_file_id to managed ID in the CheckBatchCost poller so GET /files/{id}/content resolves routing - PR #27984

Management Endpoints / UI

Bugs

Auth / OAuth

Allow allowlisted redirect URIs in OAuth setup - PR #27761

Config

Make /config/update env-var encryption idempotent (fixes double-encryption on repeated updates) + endpoint-level regression test - PR #28022

Models + Endpoints

Sort BYOK models by their displayed name in /v2/model/info - PR #28079

AI Integrations

Logging

OpenTelemetry

OTel-standard attributes on the proxy SERVER span: http.response.status_code, http.route, url.path, litellm.preprocessing.duration_ms - PR #28040

Set http.response.status_code on the success SERVER span (not just error spans) - PR #28090

Opt-in support for the experimental OTEL GenAI semantic conventions (OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental); default behavior unchanged - PR #27418

Guardrails

Lasso

Add tool-calling support to LassoGuardrail (expands tool_calls / role=tool into Lasso tool_use / tool_result blocks; maps tool definitions) - PR #27648

CrowdStrike AIDR

Improve CrowdStrike AIDR input handling - PR #26658

Secret Managers

General

Import get_secret at runtime to avoid an import-time ordering bug - PR #28014

Spend Tracking, Budgets and Rate Limiting

Rate Limiting — Stop the v3 limiter from leaking internal reservation keys (_litellm_rate_limit_descriptors, litellm_tpm_reserved*) into the upstream provider body; this regression broke every virtual key with a tpm_limit / rpm_limit - PR #27913

Budgets — Tighten budget field validation and add missing authorization checks on user self-update / key-generation paths - PR #27897

Cost Tracking — Fix zero cost/usage on completed Vertex AI batch jobs (file content is now OpenAI-shaped post-#25627; old code read stale usageMetadata.*) - PR #27912

MCP Gateway

Delegate-auth PKCE bypass for internal (available_on_public_internet: false) oauth2 interactive MCP servers — same anonymous PKCE path as public servers; client_credentials exclusion unchanged - PR #27977

Expose delegate_auth_to_upstream in the GET /v1/mcp/server list API (_build_mcp_server_table was dropping it, so the dashboard always showed false) - PR #27936

Performance / Loadbalancing / Reliability improvements

Weighted-Routing Failover — on failure, retry the same model group on a different deployment while the initial pick respects configured weights; behind a router-level flag - PR #27980

Chat-completions fast path — cache callback capabilities once instead of re-scanning litellm.callbacks per request; skip streaming-iterator wrapping when no callback needs it - PR #27858

Componentized deployment — additive gateway/, backend/, ui/ Dockerfiles + Helm chart (per-component Deployment/Service/HPA, no edits to existing modules) - PR #27557

Terraform stacks — AWS ECS + GCP Cloud Run stacks for deploying the componentized gateway - PR #27673

General Proxy Improvements

Testing, CI & build hardening:

VCR cache observability: classify cache verdicts, detect live calls, surface cost leaks, aggregate xdist worker stats; Bedrock hostname / RFC1918 fixes - PR #27795

Reasoning-effort grid e2e regression suite (status classified by exception status_code); Fireworks / Gemini tests mocked instead of live - PR #28036

Modernize model references in CI tests and configs - PR #27856

Codecov: flag uploads, enable carryforward, close coverage gaps; --cov=./litellm path resolution - PR #28028, PR #27960

mutmut: enable mutate_only_covered_lines to fit CI budget - PR #27910

Remove unused GitHub Actions workflows and orphan files - PR #27957

Preserve global Button/Tooltip mocks in per-file vi.mock (UI tests) - PR #27958

Isolate run_server CLI tests from the Prisma DB-setup path - PR #28029

Validate response fields against the Interaction schema - PR #28037

De-flake test_gemini_image_size_limit_exceeded - PR #28039

Pin openai==2.33.0 in uv.lock - PR #28088

New Contributors

@vladpolevoi made their first contribution in #27648

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.85.0...v1.86.0

Summary 05/16/2026 (v1.86.0):

New Models / Updated Models: 2

LLM API Endpoints: 3

Management Endpoints / UI: 3

AI Integrations (Logging / Guardrails / Secret Managers): 6

Spend Tracking, Budgets and Rate Limiting: 3

MCP Gateway: 3

Performance / Loadbalancing / Reliability improvements: 4

General Proxy Improvements (testing / CI / build): 12

Documentation Updates: 0

Total: 36 PRs
Original source
Jan 1, 2025
Date parsed from source:
Jan 1, 2025

First seen by Releasebot:
May 31, 2026
liteLLM

v1.80.5-stable - Gemini 3.0 Support

liteLLM adds Prompt Studio, Gemini 3 support, and a new Model Compare UI, while also expanding MCP Hub, batch routing and spend tracking, AWS Secret Manager IAM auth, and realtime performance with major latency gains.
Key Highlights

Gemini 3 - Day-0 support for Gemini 3 models with thought signatures

Prompt Management - Full prompt versioning support with UI for editing, testing, and version history

MCP Hub - Publish and discover MCP servers within your organization

Model Compare UI - Side-by-side model comparison interface for testing

Batch API Spend Tracking - Granular spend tracking with custom metadata for batch and file creation requests

AWS IAM Secret Manager - IAM role authentication support for AWS Secret Manager

Logging Callback Controls - Admin-level controls to prevent callers from disabling logging callbacks in compliance environments

Proxy CLI JWT Authentication - Enable developers to authenticate to LiteLLM AI Gateway using the Proxy CLI

Batch API Routing - Route batch operations to different provider accounts using model-specific credentials from your config.yaml

Prompt Management

This release introduces LiteLLM Prompt Studio - a comprehensive prompt management solution built directly into the LiteLLM UI. Create, test, and version your prompts without leaving your browser.

You can now do the following on LiteLLM Prompt Studio:

Create & Test Prompts: Build prompts with developer messages (system instructions) and test them in real-time with an interactive chat interface

Dynamic Variables: Use {{variable_name}} syntax to create reusable prompt templates with automatic variable detection

Version Control: Automatic versioning for every prompt update with complete version history tracking and rollback capabilities

Prompt Studio: Edit prompts in a dedicated studio environment with live testing and preview

API Integration:
Use your prompts in any application with simple API calls:

response = client.chat.completions.create( model = "gpt-4", extra_body = { "prompt_id": "your-prompt-id", "prompt_version": 2, # Optional: specify version "prompt_variables": { "name": "value" } # Optional: pass variables } )

Performance – /realtime 182× Lower p99 Latency

This update reduces /realtime latency by removing redundant encodings on the hot path, reusing shared SSL contexts, and caching formatting strings that were being regenerated twice per request despite rarely changing.

Results

Metric | Before | After | Improvement
Median latency | 2,200 ms | 59 ms | −97% (37× faster)
p95 latency | 8,500 ms | 67 ms | −99% (127× faster)
p99 latency | 18,000 ms | 99 ms | −99% (182× faster)
Average latency | 3,214 ms | 63 ms | −98% (51× faster)
RPS | 165 | 1,207 | +631% (~7.3× increase)

Test Setup

Category | Specification
Load Testing | Locust: 1,000 concurrent users, 500 ramp-up
System | 4 vCPUs, 8 GB RAM, 4 workers, 4 instances
Database | PostgreSQL (Redis unused)
Configuration | config.yaml
Load Script | no_cache_hits.py

Model Compare UI

New interactive playground UI enables side-by-side comparison of multiple LLM models, making it easy to evaluate and compare model responses.

Features:

Compare responses from multiple models in real-time

Side-by-side view with synchronized scrolling

Support for all LiteLLM-supported models

Cost tracking per model

Response time comparison

Pre-configured prompts for quick and easy testing

Details:

Parameterization: Configure API keys, endpoints, models, and model parameters, as well as interaction types (chat completions, embeddings, etc.)

Model Comparison: Compare up to 3 different models simultaneously with side-by-side response views

Comparison Metrics: View detailed comparison information including:

Time To First Token

Input / Output / Reasoning Tokens

Total Latency

Cost (if enabled in config)

Safety Filters: Configure and test guardrails (safety filters) directly in the playground interface

New Providers and Endpoints

New Providers

Docker Model Runner: /v1/chat/completions - Run LLM models in Docker containers

New Models / Updated Models

New Model Support

Features

Gemini (Google AI Studio + Vertex AI)

Add Day 0 gemini-3-pro-preview support

Add support for Gemini 3 Pro Image model

Add reasoning_content to streaming responses with tools enabled

Add includeThoughts=True for Gemini 3 reasoning_effort

Support thought signatures for Gemini 3 in responses API

Correct wrong system message handling for gemma

Gemini 3 Pro Image: capture image_tokens and support cost_per_output_image

Fix missing costs for gemini-2.5-flash-image

Gemini 3 thought signatures in tool call id

Azure

Add azure gpt-5.1 models

Add Azure models 2025 11 to cost maps

Update Azure Pricing

Add SSML Support for Azure Text-to-Speech (AVA)

OpenAI

Support GPT-5.1 reasoning.effort='none' in proxy

Add gpt-5.1-codex and gpt-5.1-codex-mini models to documentation

Inherit BaseVideoConfig to enable async content response for OpenAI video

Anthropic

Add support for strict parameter in Anthropic tool schemas

Add image as url support to anthropic

Add thought signature support to v1/messages api

Anthropic - support Structured Outputs output_format for Claude 4.5 sonnet and Opus 4.1

Bedrock

Haiku 4.5 correct Bedrock configs

Ensure consistent chunk IDs in Bedrock streaming responses

Add Claude 4.5 to US Gov Cloud

Fix images being dropped from tool results for bedrock

Vertex AI

Add Vertex AI Image Edit Support

Update veo 3 pricing and add prod models

Fix Video download for veo3

Snowflake

Snowflake provider support: added embeddings, PAT, account_id

OCI

Add oci_endpoint_id Parameter for OCI Dedicated Endpoints

XAI

Add support for Grok 4.1 Fast models

Together AI

Add GLM 4.6 from together.ai

Cerebras

Fix Cerebras GPT-OSS-120B model name

Bug Fixes

OpenAI

Fix for 16863 - openai conversion from responses to completions

Revert "Make all gpt-5 and reasoning models to responses by default"

General

Get custom_llm_provider from query param

Fix optional param mapping

Add None check for litellm_params

LLM API Endpoints

Features

Responses API

Add Responses API support for gpt-5.1-codex model

Add managed files support for responses API

Add extra_body support for response supported api params from chat completion

Batch API

Support /delete for files + support /cancel for batches

Add config based routing support for batches and files

Populate spend_logs_metadata in batch and files endpoints

Search APIs

Search APIs - error in firecrawl-search "Invalid request body"

Vector Stores

Fix vector store create issue

Team vector-store permissions now respected for key access

Audio Transcription

Fix audio transcription cost tracking

Add missing shared_sessions to audio/transcriptions

Video Generation API

Fix videos tagging

Bugs

General

Responses API cost tracking with custom deployment names

Trim logged response strings in spend-logs

SSO

Ensure role from SSO provider is used when a user is inserted onto LiteLLM

Docs - SSO - Manage User Roles via Azure App Roles

Auth

Ensure Team Tags works when using JWT Auth

Fix key never expires

Swagger UI

Fixes Swagger UI resolver errors for chat completion endpoints caused by Pydantic v2 $defs not being properly exposed in the OpenAPI schema

AI Integrations

Logging

Arize Phoenix

Fix arize phoenix logging

Arize Phoenix - root span logging

Langfuse

Filter secret fields form Langfuse

General

Exclude litellm_credential_name from Sensitive Data Masker (Updated)

Allow admins to disable, dynamic callback controls

Guardrails

IBM Guardrails

Fix IBM Guardrails optional params, add extra_headers field

Noma Guardrail

Use LiteLLM key alias as fallback Noma applicationId in NomaGuardrail

Allow custom violation message for tool-permission guardrail

Grayswan Guardrail

Grayswan guardrail passthrough on flagged

General Guardrails

Fix prompt injection not working

Prompt Management

Allow specifying just prompt_id in a request to a model

Add support for versioning prompts

Allow storing prompt version in DB

Add UI for editing the prompts

Allow testing prompts with Chat UI

Allow viewing version history

Allow specifying prompt version in code

UI, allow seeing model, prompt id for Prompt

Show "get code" section for prompt management + minor polish of showing version history

Secret Managers

AWS Secrets Manager

Adds IAM role assumption support for AWS Secret Manager

MCP Gateway

MCP Hub - Publish/discover MCP Servers within a company

MCP Resources - MCP resources support

MCP OAuth - Docs - mcp oauth flow details

MCP Lifecycle - Drop MCPClient.connect and use run_with_session lifecycle

MCP Server IDs - Add mcp server ids

MCP URL Format - Fix mcp url format

Performance / Loadbalancing / Reliability improvements

Realtime Endpoint Performance - Fix bottlenecks degrading realtime endpoint performance

SSL Context Caching - Cache SSL contexts to prevent excessive memory allocation

Cache Optimization - Fix cache cooldown key generation

Router Cache - Fix routing for requests with same cacheable prefix but different user messages

Redis Event Loop - Fix redis event loop closed at first call

Dependency Management - Upgrade pydantic to version 2.11.0

Documentation Updates

Provider Documentation

Add missing details to benchmark comparison

Fix anthropic pass-through endpoint

Cleanup repo and improve AI docs

API Documentation

Add docs related to openai metadata

Update docs with all supported endpoints and cost tracking

General Documentation

Add mini-swe-agent to Projects built on LiteLLM

Infrastructure / CI/CD

UI Testing

Break e2e_ui_testing into build, unit, and e2e steps

Building UI for Testing

CI/CD Fixes

Dependency Management

Bump js-yaml from 3.14.1 to 3.14.2 in /tests/proxy_admin_ui_tests/ui_unit_tests

Bump js-yaml from 3.14.1 to 3.14.2

Migration

Migration job labels

Config

This yaml actually works

Release Notes

Add perf improvements on embeddings to release notes

Docs - v1.80.0

Investigation

Investigate issue root cause

New Contributors

@mattmorgis made their first contribution in PR #16371

@mmandic-coatue made their first contribution in PR #16732

@Bradley-Butcher made their first contribution in PR #16725

@BenjaminLevy made their first contribution in PR #16757

@CatBraaain made their first contribution in PR #16767

@tushar8408 made their first contribution in PR #16831

@nbsp1221 made their first contribution in PR #16845

@idola9 made their first contribution in PR #16832

@nkukard made their first contribution in PR #16864

@alhuang10 made their first contribution in PR #16852

@sebslight made their first contribution in PR #16838

@TsurumaruTsuyoshi made their first contribution in PR #16905

@cyberjunk made their first contribution in PR #16492

@colinlin-stripe made their first contribution in PR #16895

@sureshdsk made their first contribution in PR #16883

@eiliyaabedini made their first contribution in PR #16875

@justin-tahara made their first contribution in PR #16957

@wangsoft made their first contribution in PR #16913

@dsduenas made their first contribution in PR #16891

Known Issues

/audit and /user/available_users routes return 404. Fixed in PR #17337

Full Changelog

View complete changelog on GitHub
Original source
May 2026
No date parsed from source.

First seen by Releasebot:
May 23, 2026
liteLLM

v1.85.1 - Gemini 3.5 Flash & Reliability Fixes

liteLLM ships a patch release with day-0 support for Gemini 3.5 Flash and reliability fixes for cross-pod spend accuracy and Vertex AI tool calling. The update broadens model support while improving budget tracking and request handling.
v1.85.1 is a patch release on top of v1.85.0. It adds day-0 support for Gemini 3.5 Flash and ships two reliability fixes — cross-pod spend accuracy and Vertex AI tool calling.

New Models / Updated Models

New Model Support (1 new model)
Provider Model Context Window Input ($/1M tokens) Output ($/1M tokens) Features Gemini / Vertex AI gemini/gemini-3.5-flash, vertex_ai/gemini-3.5-flash 1M $1.50 $9.00 Reasoning, vision, audio input, PDF input, prompt caching, web search, function calling, response schema
Features

Gemini / Vertex AI

Day-0 support for Gemini 3.5 Flash on both Google AI Studio and Vertex AI - PR #28268

Bug Fixes

Vertex AI

Omit the function_call / function_response id on Vertex Gemini 3.5+ tool turns, fixing HTTP 400 Unknown name "id" errors. Google AI Studio (gemini provider) still forwards the id on Gemini 3.5+ for strict tool-call matching - PR #28324

Spend Tracking, Budgets and Rate Limiting

Seed the Redis spend counter via SET NX instead of INCRBYFLOAT to prevent cross-pod double-seeding. On multi-pod deployments this previously caused team spend to jump to ~Nx the pod count after a Redis cache miss / TTL expiry, triggering false "Budget Crossed" alerts - PR #27854

Full Changelog

https://github.com/BerriAI/litellm/compare/v1.85.0...v1.85.1
Original source
May 2026
No date parsed from source.

First seen by Releasebot:
May 23, 2026
liteLLM

v1.84.1 - Gemini 3.5 Flash & Reliability Fixes

liteLLM ships v1.84.1 with day-0 support for Gemini 3.5 Flash on Google AI Studio and Vertex AI, plus reliability fixes for cross-pod spend accuracy and Vertex AI tool calling.
v1.84.1 is a patch release on top of v1.84.0. It adds day-0 support for Gemini 3.5 Flash and ships two reliability fixes — cross-pod spend accuracy and Vertex AI tool calling.

New Model Support (1 new model)

Provider: Gemini / Vertex AI
Model: gemini/gemini-3.5-flash, vertex_ai/gemini-3.5-flash
Context Window: 1M
Input ($/1M tokens): $1.50
Output ($/1M tokens): $9.00
Features: Reasoning, vision, audio input, PDF input, prompt caching, web search, function calling, response schema

Features

Gemini / Vertex AI

Day-0 support for Gemini 3.5 Flash on both Google AI Studio and Vertex AI - PR #28268

Bug Fixes

Vertex AI

Omit the function_call/function_response id on Vertex Gemini 3.5+ tool turns, fixing HTTP 400 Unknown name "id" errors. Google AI Studio (gemini provider) still forwards the id on Gemini 3.5+ for strict tool-call matching - PR #28324

Spend Tracking, Budgets and Rate Limiting

Seed the Redis spend counter via SET NX instead of INCRBYFLOAT to prevent cross-pod double-seeding. On multi-pod deployments this previously caused team spend to jump to ~Nx the pod count after a Redis cache miss / TTL expiry, triggering false "Budget Crossed" alerts - PR #27854

Full Changelog

https://github.com/BerriAI/litellm/compare/v1.84.0...v1.84.1
Original source
May 16, 2026
Date parsed from source:
May 16, 2026

First seen by Releasebot:
May 23, 2026
liteLLM

v1.85.0 - Realtime GA, MCP Gateway Expansion & Hardened Multi-Tenancy

liteLLM ships a major release with OpenAI Realtime GA support, stronger multi-tenant isolation, expanded MCP Gateway permissions and auth, a broad observability overhaul, and new model support across OpenAI, xAI, OpenRouter, SambaNova, and Bedrock.
Key Highlights

OpenAI Realtime GA — first-class support for the GA OpenAI Realtime API (plus beta compatibility), including gpt-realtime-2 pricing and /openai/v1/realtime logging.

Hardened multi-tenancy — a large sweep of per-tenant scoping fixes across keys, projects, batches, files, MCP servers, and analytics endpoints (project-hijack/key-org isolation, service-account resource isolation, per-entity team/agent activity scoping).

MCP Gateway expansion — org-level MCP server/toolset permissions, OBO (on-behalf-of) MCP auth, delegate_auth_to_upstream PKCE passthrough, and MCP access-group name namespacing.

Observability overhaul — broad Prometheus fixes (label-count correctness, end-user cardinality cap, PromQL escaping), OTEL handler isolation + GenAI message-content capture, and decoupled S3 audit-log config.

New models — xAI grok-4.3 / grok-4.3-latest, OpenAI gpt-realtime-2, OpenRouter qwen/qwen3.6-plus, SambaNova MiniMax-M2.7, and Bedrock Z.AI GLM-5.

New Models / Updated Models

New Model Support (5 new models)

Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features
OpenAI | gpt-realtime-2 | 32K | $4.00 (audio in $32.00) | $16.00 (audio out $64.00) | Realtime (/v1/realtime), audio in/out, function calling, parallel tool calls
xAI | xai/grok-4.3 | 1M | $1.25 (>200K: $2.50) | $2.50 (>200K: $5.00) | Reasoning, vision, prompt caching, response schema, web search, tool calling
xAI | xai/grok-4.3-latest | 1M | $1.25 (>200K: $2.50) | $2.50 (>200K: $5.00) | Reasoning, vision, prompt caching, response schema, web search, tool calling
OpenRouter | openrouter/qwen/qwen3.6-plus | 1M | $0.325 | $1.95 | Reasoning, vision, function calling, tool choice
SambaNova | sambanova/MiniMax-M2.7 | 204.8K | $0.30 | $1.20 | Reasoning, function calling, tool choice

Pricing/metadata also updated for existing entries: Gemini multimodal-embedding pricing repointed to the Vertex pricing source with image/audio/video per-unit costs, audio-token cost reductions on realtime/Gemini entries, and a gemini-embedding-2-preview cost alignment.

Features

Anthropic

Forward output_config.effort, reject garbage reasoning_effort with 400, and omit thinking/output_config when reasoning_effort="none" - PR #27074, PR #27039

Add Bedrock Claude Platform route - PR #27678

Inject dummy tool without modify_params - PR #27620

Bedrock

Add Z.AI GLM-5 model support - PR #24338

Handle document content blocks in Converse API message conversion - PR #24644

Refactor response stream shape handling - PR #27257

Vertex AI

Model Garden OpenAPI support for publisher model IDs - PR #26076

Omit system_instruction / tools / toolConfig when cachedContent set - PR #26077

Gemini

Follow provider defaults for Gemini 3 thinking - PR #25764

Handle Gemini Files API URIs without fetching - PR #24922

Normalize response_schema on native generateContent - PR #27775

xAI

Add parallel_tool_calls to supported params - PR #25106

Azure

Authenticate to Azure with a token - PR #27556

Azure Sentinel audit-log support - PR #27280

General

gpt-5.5 reasoning-effort capability flags + supports_low_reasoning_effort - PR #26456

Match litellm.completion supported params with proxy model info - PR #27720

Bug Fixes

OpenRouter

Strip openrouter/ prefix from model names - PR #24282

Azure

Forward api_version to aembedding() for Azure AI Foundry v1 endpoints - PR #24911

Route Azure container file requests by decoded deployment - PR #26402

Anthropic / Vertex

Fix Vertex Anthropic streaming status-error hangs - PR #27310

Fix Anthropic streaming reasoning token usage - PR #27319

Fireworks AI

Strip thinking_blocks from chat messages before the Fireworks API call - PR #27881

hosted vLLM

Normalize custom tools for chat completions - PR #25763

General

Decode unified file_id when model_file_id_mapping is unavailable - PR #27406

Pass output_config through to backends that accept it - PR #26439

Resolve provider from deployment for multi-provider default config - PR #27517

Return 503 from /health when the targeted model is unhealthy or DB is disconnected - PR #27003

Guard URL-valued model destinations and align resource-model auth checks - PR #26915, PR #26963

LLM API Endpoints

Features

Realtime API

OpenAI Realtime GA support and beta compatibility - PR #27110

Add /openai/v1/realtime to routes for logging - PR #27323

Responses API

Persist and replay streamed Responses API requests from cache - PR #24580

Route gpt-5.4+ chat-without-tools to the Responses API - PR #27618

Preserve cache_control in Responses → Chat Completion transformation - PR #27727

Normalize chat tool_choice for the completions→responses bridge - PR #27634

Batches

Bedrock batch model-invocation job retrieval - PR #26834

Transform Vertex AI batch prediction outputs to OpenAI format - PR #25627

Set response=null on batch error entries per OpenAI spec - PR #27041

Embeddings

Default OpenAI-path encoding_format to float - PR #26976

Separate embeddings for multimodal inputs + combined multimodal embeddings via nested input - PR #24337, PR #24341

Audio Transcription

Add NVIDIA Riva STT provider - PR #27185

Vector Stores

Resolve embedding config at request time, never persist credentials - PR #27082

Tighten managed-store access - PR #26930

Bugs

General

Preserve compact_20260112 context management on Bedrock /v1/messages - PR #27534

Fix managed file model_mappings when the router resolves a single deployment dict - PR #26950

Omit model from Azure deployment image-gen / image-edit bodies - PR #27103

Fix Bedrock passthrough call-ID headers - PR #27412

Pin Responses API affinity to the Azure resource on model-group switch - PR #27703

Align vertex_ai/gemini-embedding-2-preview cost with Vertex multimodal pricing - PR #27848

Consolidate batch + dynamic limiter check/increment - PR #26954

Authorization hardening

Block missing write routes for proxy admin viewers; restore admin-viewer read parity on Logs + Settings - PR #27007, PR #26846

Encode upstream URL path identifiers; require a trusted proxy for header-identity auth - PR #26860, PR #26825

Bind generic SSO state to a session cookie; allow non-admin compliance-path reads - PR #26944, PR #27234

Keys / Teams / SCIM

Honor key access_group_ids when a team restricts models; resolve access-group names in team filtering and same-name deployment routing - PR #26275, PR #25224, PR #26161

Revoke virtual keys when SCIM deprovisions a user; fix SCIM user-lookup filters - PR #26861, PR #27308

Key-rotation bug fix; honor team_member_permissions on /key/list - PR #27756, PR #27026

/config/update targeted per-section writes (drop store_model_in_db gate) - PR #26643

Scope CLI stored token to base_url; redact Gemini API key from URL query params in error traces - PR #26945, PR #24943

UI fixes

Remove the insecure ?token= URL handler from the login page; clear admin session cookies before establishing an invited user's session; URL-encode team_id in teamInfoCall - PR #26924, PR #27227, PR #27466

Project dropdown empty for internal users (3 bugs); remove blank leading entry from access-group model dropdown; omit allowed_routes from key edit save when unchanged - PR #26664, PR #27521, PR #27553

Member/team access-group fix; team model test-connection authorization - PR #27317, PR #27487

AI Integrations

Logging

Prometheus

Fix custom-metadata label counts, cap end-user metric cardinality, fix remaining-metric zero values, escape api_key for PromQL string literals, emit litellm_remaining_tokens_metric for Bedrock & Vertex - PR #27268, PR #27272, PR #27348, PR #27013, PR #27705

Fix /metrics hang when require_auth_for_metrics_endpoint is true and auth succeeds; point /metrics 401 at the opt-out flag; fix metric labels for litellm-side rejects - PR #25980, PR #27502, PR #26947

OpenTelemetry

Isolate dual OTEL handlers; honor OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT; fix proxy-integration tracing bugs - PR #27018, PR #27403, PR #27757

Arize / LangSmith

Arize _set_usage_outputs handles raw OpenAI Pydantic CompletionUsage; remove unwanted metadata info from LangSmith - PR #26506, PR #26894

General

Decouple S3 audit-log config via s3_audit_callback_params - PR #27222

Set verbose_logger level when LITELLM_LOG=INFO; require a team-management role on /team/{id}/callback; close callback-config and observability-credential side channels; guard dynamic integration hosts - PR #26401, PR #26819, PR #27081, PR #26921

Guardrails

General

Add Qohash Nexus guardrail hook - PR #24927

Run model-level post_call guardrails on streaming requests; ensure post-call guardrail fires exactly once - PR #26922, PR #27012, PR #26109

Preserve Responses event streams in Presidio output masking - PR #26878

Cover multimodal + Responses-API content shapes; tighten tool-permission checks; optional skip of tool message in unified guardrail inputs - PR #26957, PR #26969, PR #27441

Handle legacy dict shape for metadata.guardrails in the Team UI - PR #27224

Prompt Management

General

Block path-traversal in BitBucket / Arize Phoenix / AssemblyAI clients; sandbox jinja2 in the GitLab/Arize/BitBucket prompt managers - PR #26943, PR #27043

Secret Managers

General

Audit-log /cache/settings and /config_overrides/hashicorp_vault mutations - PR #26953

Spend Tracking, Budgets and Rate Limiting

Rate Limiting

Atomic TPM rate limit; include model name + configured TPM/RPM in priority rate-limit 429 errors - PR #27001, PR #27216

Load team-member RPM/TPM from membership budget in the combined view - PR #24925

Budgets

Skip the personal-budget hook when a reservation covers the counter - PR #27021

Treat 0 team_member_budget as no cap; enforce team-member budget without a user row; reset org/tag/proxy budgets correctly - PR #27133, PR #27273, PR #27326, PR #27488

Flush virtual-key model_max budget spend to Redis after success logging; tighten budget spend admission - PR #27334, PR #26845

Tag Budgets & Routing

Enforce tag budgets on x-litellm-tags header requests; tag-budget reset drops stale management-cache entries; union x-litellm-tags with static team/key tags; fix internal tag-usage scoping; always merge caller-supplied tags into request metadata - PR #27573, PR #27568, PR #27247, PR #27315, PR #27784

Tag-routing test preventing header-regex bypass for strict plain-text tags - PR #26805

Spend Logs / Cost

Pass service_tier through Azure and Azure AI cost calculation - PR #24926

Opt-in suppression of stack traces in spend-tracking error logs; keep spend-log cleanup running after batch failures; redact echoed prompts in error_information; prevent secret_fields from leaking into spend logs; drop client-supplied pricing fields from request bodies - PR #26899, PR #27303, PR #27689, PR #27143, PR #27071

MCP Gateway

Features

Org-level MCP server and toolset permissions - PR #26960

OBO (on-behalf-of) MCP auth - PR #27421

delegate_auth_to_upstream flag for PKCE passthrough - PR #27834

Support MCP access-group names in URL-based namespacing - PR #27726

Bugs

Sanitize tool names to Anthropic's [a-zA-Z0-9_-]{1,128} pattern - PR #26788

Require a trusted-proxy gate before honoring X-Forwarded-* on OAuth discovery; preserve oauth2 m2m auth for tools routes; run pre_call_tool_check on the OpenAPI/local-registry path - PR #26841, PR #26871, PR #27016

Redact MCP server URL/headers for non-admin viewers; replace user-API-key auth with authorization-or-cookie for MCP server creation - PR #27027, PR #27190

Fix MCP DB reload partial failures; surface upstream 401 for token-forwarding MCP servers - PR #27314, PR #27847

Performance / Loadbalancing / Reliability improvements

Routing & Reliability

Trigger fallbacks on mid-stream httpx.TimeoutException - PR #26998

Register cooldowns on failure + fail fast on stale encrypted_content (Responses) - PR #27820

Register model info under the responses/-stripped variant - PR #27531

Fix Redis Sentinel client handling for authenticated Sentinel setups - PR #26302

Proxy hot path

Token-verification query optimization - PR #26202

Run daily activity aggregation off the event loop - PR #27264

Shared IAM cache + static credentials in BaseAWSLLM - PR #27125

Isolate semantic cache entries; stable Redis key generation across working directories; remove a duplicate in-memory cache-size constant - PR #26990, PR #27025, PR #26385

Early proxy request-size enforcement; coerce non-str x-litellm-* header values to avoid an httpx TypeError - PR #27311, PR #27504

Separate DB read and write endpoints - PR #27493

Health checks

Shared health-check polling; health_check_reasoning_effort for model health checks; skip disable_background_health_check models on GET /health; scope /health response to the caller's models; remove the separate health app - PR #26434, PR #27115, PR #27716, PR #26935, PR #27430

Config / startup robustness

Hot-reload config YAML when --reload is set; break the managed-resources import cycle on Python 3.13; reject bare-str file-input sinks (local-file read hardening) - PR #27274, PR #27160, PR #27762

Packaging / Docker / Helm / CI

Pin Wolfi & uv to multi-arch index digests; remove the hardcoded Prisma binary target for multi-arch builds; clear flagged OS-package advisories on the Docker image; refresh dependency locks - PR #27123, PR #27170, PR #27225, PR #27126

Helm: skip startup prisma db push when a migrations Job is enabled; increase default probe timeouts, disable debug logging by default - PR #27200, PR #27237

CI: Rerun Failed Tests for all pytest jobs, block PRs that drop coverage, Redis-backed VCR replay caches, reduce cassette bloat, mutation-testing workflow, dev-tag detection in the release workflow, Playwright apt-install skip - PR #27155, PR #27340, PR #26838, PR #27159, PR #27409, PR #27576, PR #26966, PR #27169

Remove legacy deployment artifacts and litellm-js packages; remove a redundant backup pricing file; misc test/import cleanup - PR #27541, PR #16590, PR #27699, PR #27633

Tighten router-settings-override and mock-testing trust; drop blank-text fallback for empty Bedrock Converse thinking blocks - PR #26968, PR #27850

Documentation Updates

Update the Greptile README logo to a higher-quality image - PR #25385

Add a BudgetManager.reset_cost docstring - PR #27867

Add a _LoopWrapper class docstring - PR #27870

New Contributors

@kimimgo made their first contribution in #24282

@shubham-arora-clear made their first contribution in #24644

@ohnoah made their first contribution in #24580

@ushiromiya-lion made their first contribution in #25106

@gowtham2809 made their first contribution in #25224

@he-yufeng made their first contribution in #26401

@MackDing made their first contribution in #26419

@dgu1-godaddy made their first contribution in #26834

@Vedanshu7 made their first contribution in #24943

@dennishenry made their first contribution in #27190

@SHARP155 made their first contribution in #27466

@mats852 made their first contribution in #24927

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.84.0...v1.85.0

Counts cover PRs new in v1.85.0 relative to v1.84.0 stable. 14 PRs that were backported into v1.84.0 stable (and documented in the v1.84.0 release notes) are excluded here to avoid double-counting.

Summary of PR counts

New Models / Updated Models: 43

LLM API Endpoints: 24

Management Endpoints / UI: 54

AI Integrations (Logging / Guardrails / Prompt Mgmt / Secret Managers): 32

Spend Tracking, Budgets and Rate Limiting: 23

MCP Gateway: 12

Performance / Loadbalancing / Reliability improvements: 41

Documentation Updates: 3

Total: 232 PRs
Original source
May 14, 2026
Date parsed from source:
May 14, 2026

First seen by Releasebot:
May 23, 2026
liteLLM

v1.84.0 - Reliability hardening + multi-pod budget accuracy

liteLLM ships v1.84.0 with a PEP 440 versioning change and a broad hardening release. It adds routing groups, MCP OAuth and credential improvements, better budget enforcement, lower Docker memory use, Prisma reconnect fixes, and several new models and providers.
Version naming change

Starting with v1.84.0, LiteLLM versions follow PEP 440. Stable releases drop the -stable suffix — the Docker tag for this release is litellm:1.84.0, not litellm:1.84.0-stable. Every Docker tag is published in both bare and v-prefixed form (litellm:1.84.0 and litellm:v1.84.0 resolve to the same image), so existing pins that include the v prefix keep working. PyPI versions remain the bare PEP 440 form: pip install litellm==1.84.0. If you pin LiteLLM in deployment tooling (Helm values, requirements.txt, Renovate rules, etc.), update those pins to the PEP 440 form.

Mapping from the legacy suffix scheme to the new PEP 440 scheme:

Channel | Legacy (≤ v1.83.x) | New (≥ v1.84.0)
Stable | vX.Y.Z-stable | vX.Y.Z
Stable patch | vX.Y.Z-stable.patch.N | vX.Y.Z.postN
Release candidate | vX.Y.Z.rc.N / vX.Y.Z-rc.N | vX.Y.ZrcN
Dev / nightly | vX.Y.Z-nightly / vX.Y.Z.dev.N | vX.Y.Z.devN

This is a naming change only — release cadence, stability guarantees, and image contents are unchanged. The v1.84.0-rc.1 tag (cut before the switch) keeps the legacy form for historical continuity; every tag from v1.84.0 onward uses the PEP 440 form.

Heads up — large bundle of behavioral changes.

This release consolidates a lot of reliability and hardening work that shipped in tight sequence. The Important Behavior Changes section below covers everything that changes a default, removes a configuration shortcut, or alters a request/response shape, with the opt-out you need to keep prior behavior. Read that section before upgrading a production deployment. If you already validated against v1.84.0-rc.1, see the Changes since v1.84.0-rc.1 section for the post-rc delta.

Key Highlights

Pass-through endpoints are authenticated by default. The auth field on entries under general_settings.pass_through_endpoints now defaults to true. The previous "OSS gets unauthenticated forwarders by default; auth: true is enterprise-only" combination is gone — auth: true works on OSS, and operators who want an unauthenticated forwarder must set auth: false explicitly.

Multi-pod budget enforcement is materially more accurate. RedisCache.async_increment gains a refresh_ttl opt-in, spend counters opt into it, and stale in-memory counters are skipped on a clean Redis miss. ResetBudgetJob invalidates Redis counters alongside DB resets so refreshed counters get reset too.

Prisma DB reconnects no longer freeze the event loop. The reconnect path replaced await self.db.disconnect() (which called subprocess.Popen.wait() synchronously) with a SIGTERM→SIGKILL → fresh Prisma() + connect() sequence. Liveness probes stop failing during database flaps. Companion fix restores reconnect-and-retry on PrismaClient.get_generic_data.

Memory footprint down ~700 MB on a two-worker Docker deployment via lazy-loaded feature routers and lazy-loaded front page. First request to a lazy route incurs the import cost; subsequent requests are unchanged.

MCP OAuth + Azure Entra discovery support, opt-in short-ID tool prefix to keep MCP tool names under the 60-char limit, and OAuth root-endpoint visibility now matches explicit server-name lookup.

Durable agent workflow run tracking via a new /v1/workflows/runs REST surface backed by LiteLLM_WorkflowRun / LiteLLM_WorkflowEvent / LiteLLM_WorkflowMessage tables. Spend logs session_id joins for free cost attribution.

Per-model routing strategies via Routing Groups. New router_settings.routing_groups schema binds a list of model_names to its own routing strategy (e.g. latency-based-routing for gpt-4o, simple-shuffle for cheaper models) within a single router. Configurable in proxy_config.yaml or from the LiteLLM dashboard under General Settings → Routing Groups; UI-managed groups persist and override the YAML values.

Changes since v1.84.0-rc.1

Everything below landed on top of v1.84.0-rc.1 and is included in v1.84.0. If you already validated against the rc, this is the only delta to re-test.

Hardening

/key/update authorization checks — PR #27878

/key/regenerate ownership-rebind + premium-gate guards — PR #27793

Reject bare strings at file-input sinks to prevent local-file reads via crafted request bodies — PR #27762

Refuse remote-URL instance-fn loads outside the config-file path — PR #27801

Cover extra_body + azure_ad_token in banned-params check — PR #27898

MCP BYOK / OAuth: block SSRF fields in RAG ingest vector_store config; block client-side pricing injection via request body — PR #27892

Budget reservation

Bound budget reservation per request instead of pinning to the entire remaining team/key/user headroom on requests without max_tokens — PR #27509

Image generation: reserve per-image cost rather than max-tokens cost; gate strictly on model mode

Health probes

Re-expose db status on the unauthenticated /health/readiness payload so external probes can distinguish DB-unreachable workers without auth — PR #27866

UI fetches litellm_version + is_detailed_debug from /health/readiness/details (auth-gated) since those fields were moved off the public payload — PR #27896

UI: disable retries on /health/readiness/details + cover token forwarding

MCP

Forward configured extra_headers from the MCP client to upstream OpenAPI HTTP calls (closes #26794) — PR #27383

On the same forwarding path, static_headers now win over caller-forwarded extra_headers on name conflict (case-insensitive). See Important Behavior Changes → MCP below.

Routing under SERVER_ROOT_PATH

Lazy-feature loading under a non-empty SERVER_ROOT_PATH no longer 404s on routes such as /api/v1/policies/attachments/list; strip the prefix before lazy-feature match and cache the normalized path at middleware init — PR #27812

Tagging & metrics

⚠️ Reverted the v1.83.10 caller-tag strip / allow_client_tags opt-in — caller-supplied tags merge into request metadata again; the strip is no longer enforced. See the new entry under Important Behavior Changes → Tags below for the full impact. — PR #27789

Point the /metrics 401 hint at the actual opt-out flag — PR #27505

Packaging

Relax core runtime pins to ranges so downstream packages can resolve a single shared openai /etc. version — PR #27241

Raise jinja2 floor in [project.dependencies] to >=3.1.6 to match the lockfile — PR #27552

⚠️ Important Behavior Changes

This release tightens a number of defaults across auth, ingress, callbacks, MCP, and the UI. Each item below names the change and, where applicable, the exact configuration you need to restore prior behavior.

Auth & request ingress
Pass-through endpoints default to auth: true
What changed:
PassThroughGenericEndpoint.auth now defaults to True. The runtime dispatch in user_api_key_auth.py reads endpoints as raw dicts, so endpoint.get("auth", True) applies even when the dict has no explicit key. The premium_user gate on auth: true was also removed — OSS deployments can now use auth: true.
Who is affected:
Any pass-through entry in general_settings.pass_through_endpoints that omitted auth:. Prior to this rc that meant unauthenticated; it now means LiteLLM-key-authenticated.
Restore prior behavior:
Set auth: false explicitly on every pass-through entry that is meant to be public (e.g. webhook receivers).
Clientside api_base / base_url are gated and credential-stripped
What changed:
(i) Clientside api_base / base_url are validated against validate_url when litellm.user_url_validation is enabled.
(ii) When a request redirects api_base / base_url, admin-configured provider credentials and per-deployment metadata (OCI signing keys, AWS / Azure / Vertex tokens, observability vars, every field on CredentialLiteLLMParams) are dropped before the call is forwarded.
(iii) The provider-inference matcher in get_llm_provider_logic.py no longer does an unanchored substring match — it now compares parsed URL hostname + segment-bounded path prefix.
(iv) The blocklist for clientside-overridable params adds aws_bedrock_runtime_endpoint, langsmith_base_url, langfuse_host, posthog_host, braintrust_host, slack_webhook_url, s3_endpoint_url, sagemaker_base_url, deployment_url. The old "blocklist is a no-op when api_key is non-empty" clause is removed.
Who is affected:
Anyone passing api_base (or any of the newly-blocked fields) at request time and relying on the implicit-api_key bypass to thread it through.
Restore prior behavior:
Use the documented BYOK paths instead of the bypass: Proxy-wide: general_settings.allow_client_side_credentials: true; Per deployment: litellm_params.configurable_clientside_auth_params: ["api_base", ...]. The 400 returned by the proxy on a blocked request names the offending field and points at the same two settings.
Master-key requests now propagate an alias instead of the master-key hash
What changed:
When a request authenticates with the master key, the UserAPIKeyAuth.api_key / token value handed to downstream code is now the constant LITELLM_PROXY_MASTER_KEY_ALIAS = "litellm_proxy_master_key". The cache lookup is unchanged (still keyed on hash_token(master_key)). _is_master_key no longer accepts the SHA-256 hash form — only the raw master key.
Who is affected:
Anything joining or filtering on the prior master-key hash value, including custom dashboards over spend logs and Prometheus /metrics queries pinned to the hash literal.
Restore prior behavior:
None — operators querying spend logs or metrics for master-key activity should switch their filter to the alias "litellm_proxy_master_key".
Invite-link onboarding no longer mints a key from GET
What changed:
GET /onboarding/get_token returns a 15-minute signed onboarding JWT bound to invite + user id; it does not mint a sk-... virtual key. POST /onboarding/claim_token requires that JWT and atomically reserves the invite via update_many(... is_accepted=False, ... → True).
Who is affected:
Any tooling that consumed GET /onboarding/get_token for an embedded sk-... and treated it as a usable session key before completing the password claim.
Restore prior behavior:
None — clients must call POST /onboarding/claim_token to obtain the live key.
CLI SSO login flow uses a server-side session
What changed:
litellm-proxy login now starts a CLI SSO flow that returns a login id + polling secret + terminal verification code. The browser callback must confirm the terminal code before the polling endpoint returns the JWT.
Who is affected:
Anyone running an older litellm-proxy CLI against an upgraded proxy — the old caller-supplied-handle handoff is gone.
Restore prior behavior:
None — upgrade the CLI alongside the proxy.
Team self-join (_is_available_team) only allows self-add as role=user
What changed:
/team/member_add: when the caller is not an admin and the team is "available," the request must add only the caller themselves with role="user". Bulk shapes are checked the same way; lists mixing a valid self-entry with a role="admin" entry are rejected. Email-only members on the self-join path are rejected.
/team/permissions_update: the _is_available_team clause is removed entirely — only proxy/team/org admins can update team_member_permissions.
Who is affected:
Any flow that relied on the blanket bypass to either add an admin to an available team without admin privileges, or to mutate team_member_permissions from a non-admin context.
Restore prior behavior:
None — perform admin-scoped operations with an admin key.
Guardrail modification permission gates on key presence
What changed:
The guardrail-modification authz check in auth_checks.py now gates on intent (whether the key is present in the request) rather than payload truthiness. Some previously-accepted shapes will now 403.
Restore prior behavior:
None — flow updates required for non-admin callers that previously slipped past on falsy payloads.
Untrusted root control fields are stripped from client requests
What changed:
_UNTRUSTED_ROOT_CONTROL_FIELDS in litellm_pre_call_utils.py includes mock_response, mock_tool_calls, redaction-bypass controls, and a few others. They are stripped from client requests unless the calling key/team carries allow_client_mock_response: true (for mock_response / mock_tool_calls) or the corresponding admin-opt-in metadata for the redaction bypass. Pillar guardrail caching headers and Bedrock dynamic evaluation overrides are also filtered when not explicitly allowed.
Who is affected:
Tests and tooling that pass mock_response / mock_tool_calls in extra_body to short-circuit completions.
Restore prior behavior:
Set allow_client_mock_response: true in the admin metadata of the test key (or the team owning it).
Error responses no longer leak re-raised local parameters
What changed:
Broad except handlers in the response-utils path used to render the captured request parameters into the re-raised error message. Those parameters can carry credentials, so they're now dropped from the rendered message.
Who is affected:
Any client that parsed credential-shaped fields out of a 5xx error body. The error response shape is otherwise unchanged.
Restore prior behavior:
None.

Vector stores
Credentials redacted; /vector_store/update is per-store gated
What changed:
/vector_store/list, /vector_store/info, /vector_store/update redact credential-bearing values inside the persisted litellm_params (handles dicts, JSON-string-serialized params, and nested-dict shapes like litellm_embedding_config).
/vector_store/update is now gated by _fetch_and_authorize_vector_store — same per-store access check /vector_store/info already had.
SensitiveDataMasker adds plural "credentials" to its default sensitive-pattern set, so segment-exact matching catches vertex_credentials, aws_credentials, etc. (Latent fix that affects every default-instantiated masker, not just vector stores.)
get_vector_store_info and update_vector_store re-raise HTTPException instead of letting the catch-all downgrade 403 / 404 to 500.
Who is affected:
Anything reading litellm_params off these responses to recover provider keys, or any non-store-admin caller mutating arbitrary vector stores via /vector_store/update.
Restore prior behavior:
None.

Logging callbacks & key/team metadata
os.environ/* callback refs in key/team metadata are no longer resolved
What changed:
convert_key_logging_metadata_to_callback() no longer resolves os.environ/* values from key/team metadata via get_secret(). Existing rows with such values are silently ignored at request setup instead of crashing the request. Trusted config.yaml team-callback env resolution in add_team_based_callbacks_from_config() is unchanged. New AddTeamCallback constructions from key/team logging metadata also reject os.environ/* callback vars.
Who is affected:
Any key/team that stored os.environ/DATABASE_URL (or similar) in its callback metadata to pick up a server env var at request time.
Restore prior behavior:
Configure those callback secrets through trusted proxy config.yaml (team_callbacks / model_list[].litellm_params) instead of putting os.environ/ references in DB-backed key or team metadata. The literal credential value can still be stored in metadata if absolutely necessary.
Team-callback admin mutations now emit audit logs
What changed:
POST /team/{id}/callback (add_team_callbacks) and POST /team/{id}/disable_logging (disable_team_logging) emit LiteLLM_AuditLogs rows when litellm.store_audit_logs=True. Additive when audit logging is enabled.
Restore prior behavior:
litellm.store_audit_logs: false (the default) suppresses the new rows.

MCP
Encrypted user-scoped MCP credentials at rest
What changed:
Writes to LiteLLM_MCPUserCredentials.credential_b64 go through encrypt_value_helper (nacl SecretBox) instead of plain urlsafe_b64encode. The read path tries nacl decryption first and falls back to plain urlsafe_b64decode for legacy rows; existing rows stay readable.
Who is affected:
Operators reading the table directly; the column contents change shape on first re-write.
Restore prior behavior:
None — backward-compat read path keeps legacy rows working until they are next written.
OAuth metadata discovery follows SSRF guard
What changed:
The two URLs MCP discovery follows (resource_metadata from WWW-Authenticate, and authorization_servers[0] from protected-resource-metadata) are now subject to async_safe_get. Same-authority metadata fetches stay direct (with follow_redirects=False); cross-origin fetches are validated via the existing user URL validation policy. Public federated providers (Azure Entra, Google, Okta, GitHub) remain supported.
Who is affected:
Cross-origin internal/loopback/cloud-metadata OAuth metadata URLs.
Restore prior behavior:
Toggle litellm.user_url_validation and the existing URL validation controls per the proxy URL-validation docs to permit your specific internal targets.
MCP public-route detection no longer matches query strings; OAuth2 fallback no longer fail-opens
What changed:
MCPRequestHandler.process_mcp_request checks request.url.path.startswith("/.well-known/") instead of ".well-known" in str(request.url). Query-string smuggling like ?.well-known is rejected.
When an Authorization header fails LiteLLM-key validation, the handler no longer treats the failure as "OAuth2 passthrough" and returns an empty UserAPIKeyAuth().
Restore prior behavior:
None.
MCP OAuth root endpoint resolves with request visibility rules
What changed:
Root-endpoint fallback resolves the single OAuth2 server using the same visibility rules as explicit server-name lookup; non-visible servers are no longer selected via the fallback path. The callback redirect path validates the full client redirect URI carried in state and appends parameters without dropping an existing query string.
Restore prior behavior:
None — adjust server visibility rather than relying on the fallback.
OpenAPI MCP: static_headers now win over caller-forwarded extra_headers
What changed:
v1.84.0 introduced header forwarding for OpenAPI-backed MCP servers (spec_path: configs) via PR #27383, letting you allowlist caller request headers into upstream OpenAPI HTTP calls. When the same header name appears in both your YAML static_headers and the request-time extra_headers allowlist, the static_headers value now wins, with case-insensitive name comparison so X-Tenant-Id and x-tenant-id are treated as the same header. This matches how the managed MCP path has always behaved. Authorization is still overridden last by a BYOK x-mcp-auth token, if present.
Example:
With mcp_servers:
data_api:
spec_path: http://upstream-api.local/openapi.json
static_headers:
X-Tenant-Id: "acme-corp"
extra_headers:
- X-Tenant-Id

a caller sending X-Tenant-Id: evil-corp will now have X-Tenant-Id: acme-corp sent upstream. Any header in extra_headers that does not collide with static_headers is still forwarded unchanged.
Who is affected:
Operators who set the same header name in both static_headers and extra_headers on an OpenAPI MCP server, and who were relying on the caller's value taking effect. (Note: this only ever shipped in the v1.84.0 release-candidate cycle — no prior stable release forwarded extra_headers for OpenAPI MCPs at all.)
Restore prior behavior:
None — if you actually want the caller to control a header, remove it from static_headers and keep it only in extra_headers, or use distinct names for the operator-pinned value and the caller-supplied value.

UI / static assets
/get_image, /get_favicon, /get_logo_url
What changed:
Remote HTTP(S) UI_LOGO_PATH / LITELLM_FAVICON_URL are now browser-loaded via redirect — the proxy no longer fetches them server-side from these unauthenticated endpoints.
Local file paths still work in place, but the resolved file must have a supported image signature (jpeg, png, gif, webp, ico); non-image paths fall back to the bundled default.
/get_logo_url only returns HTTP(S) values; local filesystem paths are not disclosed.
Stale cached_logo.jpg files are no longer served by /get_image.
Who is affected:
Custom branding setups that pointed UI_LOGO_PATH / LITELLM_FAVICON_URL at non-image local files, or relied on /get_logo_url to surface a local path.
Restore prior behavior:
No new env vars required. Existing remote URLs continue to work; local image paths continue to work as long as the file is a recognized image type.
/ui/chat removed
What changed:
Static chat.html / chat.txt / chat/ are gone; the route 404s. The chat UI was already removed from the nav; the dangling static build is now also gone.
Restore prior behavior:
None.
"Store Prompts in Spend Logs" toggle moved to Admin Settings
What changed:
Both "Store Prompts in Spend Logs" and "Maximum Spend Logs Retention Period" moved from a gear-icon modal on the Logs page to Admin Settings → Logging Settings. The gear was visible to non-admins and surfaced 403s on save.
Restore prior behavior:
None — controls are admin-only as /config/update and /config/list already required.

Tags
⚠️ Reverted: v1.83.10 caller-tag strip / allow_client_tags opt-in
What changed:
This release reverts the v1.83.10 breaking change that stripped caller-supplied tags unless the key/team metadata had allow_client_tags: true. Caller-supplied tags from x-litellm-tags, body-level tags, and metadata.tags now flow into metadata.tags again and union with admin-configured static tags from key/team/project metadata — the proxy's behavior is back to what it was before v1.83.10. The pre-call strip block in litellm_pre_call_utils.py is removed, and the flag has no schema or endpoint footprint, so leftover allow_client_tags: true values on existing keys/teams are inert.
Who is affected:
Operators who set metadata.allow_client_tags: true on keys/teams to opt into client tags: the flag is now a no-op and can be cleaned up at leisure.
Operators who relied on the v1.83.10 strip to block client-supplied tags reaching tag-based routing or tag-based spend attribution: the strip is no longer enforced. Re-evaluate your tag-based routing and cost-attribution exposure before upgrading.
Restore prior behavior:
None — the strip path is gone from the proxy. If caller-supplied tags must be blocked, filter them upstream (gateway / ingress) or in a custom pre-call hook.

New Models / Updated Models

New Model Support (16 new models) including OpenAI gpt-image-2, Azure OpenAI azure/gpt-image-2, AWS Bedrock zai.glm-5, Crusoe models, Vertex AI grok models, and others with various features like vision, pdf input, function calling, reasoning, tool choice.

New Providers (2 new providers)

AIHubMix: OpenAI-compatible chat completions
Crusoe: chat completions across reasoning / instruct catalogs

Pricing updates

OpenAI gpt-5.5-pro corrected pricing: was 2× OpenAI's published rate. Cost-tracking output for gpt-5.5-pro will drop to half what it reported under previous releases — operators reconciling spend reports across the upgrade boundary should expect the discontinuity. - PR #26651

AWS Bedrock Anthropic Claude 4.5 / 4.6 / 4.7 (Global + US) — added cache_creation_input_token_cost_above_1hr (and the _above_200k_tokens LC variant for Sonnet 4.5). 1-hour-TTL prompt-cache writes on Bedrock now bill at the published 1.6× rate instead of falling back to the 5-minute rate (was undercounting by ~60%). - PR #26800

Features

Bedrock: Preserve cache_control TTL on tools for Claude 4.5+ on the Converse path; sanitize tools blocks on the Invoke path - PR #25855

Translate OpenAI file content on the tool-result path (Bedrock Converse + direct Anthropic) - PR #26710

retrievalConfiguration passthrough for vector-store search via extra_body - PR #26685

Vertex AI: Propagate metadata labels to embeddings (labels), Imagen (labels), and Discovery Engine rerank (userLabels); shared helper across paths - PR #25499

Reuse Anthropic-messages config instances via @lru_cache so VertexBase credential cache survives across calls - PR #26099

Google Native: Emit LiteLLM proxy success headers (x-litellm-*) on :generateContent and :streamGenerateContent - PR #25500

Run pre_call_hook on :generateContent / :streamGenerateContent so guardrails fire - PR #26914

Anthropic: JSON response_format + user tools on non-streaming: filtered tool calls + structured JSON merged into content; internal json_tool_call no longer surfaces - PR #26222

Ollama: Forward tool_calls on assistant messages and tool_call_id on role: tool messages — fixes the infinite tool-call loop on multi-turn agents - PR #26122

Predibase: Migrate transform_request / transform_response into transformation.py (refactor, no behavior change) - PR #25249

AIHubMix (new): First-class OpenAI-compatible provider entry - PR #24294

Bug Fixes

Vertex AI: Preserve items on the array branch of anyOf schemas with null (Vertex was rejecting INVALID_ARGUMENT) - PR #26675

Bedrock: GET /v1/batches/{batch_id} forwards model from the encoded id (was returning LiteLLM doesn't support bedrock for 'create_batch') - PR #26814

Pass-through stream interruption now flushes spend tracking — GeneratorExit from client disconnect was dropping per-chunk usage values - PR #26719

Replace deprecated Claude 3.7 Sonnet test references with claude-sonnet-4-5-20250929-v1:0 across 16 test files - PR #26721

Router custom pricing: Propagate custom cost_per_token from DB model_info through the fallback path - PR #25888

Responses API: DELETE /openai/responses/{id} no longer sends json={} — Azure now rejects the empty {} body with unexpected_body - PR #26949

Pass-through endpoints: Invoke post-call guardrails on non-streaming pass-through responses (/vertex_ai/, /openai/, /bedrock/*); opt-in only when guardrails are configured for the route - PR #26262

Inherit caller identity from litellm_params metadata when fabricating UserAPIKeyAuth for managed-files passthrough batch creation (Anthropic + Vertex AI) - PR #26831

Embedding cache: Preserve prompt_tokens_details (incl. image_count) through the cache round-trip; aggregate per-item details on retrieval; merge in combine_usage() for partial cache hits - PR #26653

Streaming logging: Backfill streaming hidden response cost into the success log path - PR #26606

Cost calculation: Unify success_handler typed + dict branches so spend rows stop logging 0 and the budget-overrun reports it caused - PR #26629

Management Endpoints / UI

Teams: Team-level search-tool credentials: new search_tools array on LiteLLM_ObjectPermissionTable; per-key permissions validated as a subset of the owning team's; UI selector under team management - PR #26691

Routing Groups: New General Settings → Routing Groups page: create, edit, and delete per-model routing strategies from the dashboard without editing proxy_config.yaml. UI-managed groups are persisted and override values defined in YAML; per-group state is rebuilt on save - PR #27131

Model Health: Pagination controls on the model health status page - PR #26826

CLI / Workers: --timeout_worker_healthcheck CLI flag (env TIMEOUT_WORKER_HEALTHCHECK) — forwards to uvicorn 0.37.0+ Config kwarg; older uvicorn = warning + no-op; gunicorn / hypercorn paths untouched - PR #26622

Memory / lazy loading: Lazy-load optional feature routers on first request (~700 MB lower memory on a two-worker Docker deployment) - PR #26534

Lazy-loaded openapi.json front page; spec generation moved to CI with a runtime stub fallback - PR #26802

Background jobs: Cleanup job for expired LiteLLM dashboard session keys - PR #26460

MCP OAuth: Azure Entra discovery endpoint support - PR #26584

MCP UI: Tool Configuration panel on the MCP server edit page switched from POST /mcp-rest/test/tools/list (temp-session preview, requires inline creds) to GET /mcp-rest/tools/list?server_id=... (stored credentials). Saved servers with auth_type of api_key / bearer_token / basic / authorization now load tools without "Unable to load tools — Failed to connect to MCP server." - PR #26002

Teams: Per-member rows with max_budget=NULL now fall through to team-level enforcement instead of silently disabling it - PR #26809

Spend logs: Strip request data from spend-log error messages - PR #26662

Vertex retrieve mocked tests: is_redirect=False set on mocked retrieve responses - PR #26844

AI Integrations

Logging
General

Opt-in retry settings for the Generic API logger batch send — transient litellm.Timeout / httpx.ConnectTimeout failures retry instead of dropping the batch - PR #26645

Cache GCP IAM token used for Redis (was being regenerated per-connection; synchronous google-auth + google-cloud-iam calls were freezing the asyncio event loop, causing ~25 s INCRBYFLOAT Redis spans in production) - PR #26441

Backfill streaming hidden response cost - PR #26606

Guardrails

CyCraft XecGuard (new): First-class partner guardrail. Multi-policy prompt/response scanning (prompt injection, harmful content, PII, system-prompt enforcement, bias, skills protection) plus RAG context-grounding via /grounding - PR #26011

Noma v2: _build_scan_payload no longer crashes during post_call / during_call / during_mcp_call on deepcopy(request_data) failures with unserializable objects (e.g. uvloop.Loop) - PR #26605

Pass-through: Post-call guardrails on non-streaming pass-through responses (see LLM API Endpoints) - PR #26262

Spend Tracking, Budgets and Rate Limiting

Multi-pod budget enforcement
RedisCache.async_increment gains refresh_ttl opt-in (used by spend counters); get_current_spend and SpendCounterReseed.coalesced skip stale per-pod in-memory on a clean Redis miss; ResetBudgetJob invalidates the Redis counter alongside every DB row reset (keys, users, teams, team members, budgets-linked keys) - PR #26829

Cost calc unification
success_handler typed + dict branches now compute cost the same way - PR #26629

Per-member null budget
Per-member rows with max_budget=NULL fall through to team enforcement - PR #26809

Bedrock 1-hour cache write pricing
Claude 4.5 / 4.6 / 4.7 Global + US entries gain cache_creation_input_token_cost_above_1hr (was undercounting ~60%) - PR #26800

gpt-5.5-pro corrected pricing
Was double-priced - PR #26651

Bedrock pass-through stream interruption
Spend tracking now flushes when client disconnects mid-stream - PR #26719

MCP Gateway
Tool prefix

Opt-in LITELLM_USE_SHORT_MCP_TOOL_PREFIX env var: switches per-tool prefix from the human-readable server name (github_onprem-get_repo) to a deterministic 3-char base62 id derived from server_id (Xy7-get_repo). Lets long server names stay under the 60-char tool-name limit some model APIs enforce - PR #26733

OAuth

Azure Entra discovery endpoint support - PR #26584

See Important Behavior Changes for public-route detection, OAuth root endpoint visibility, OAuth metadata SSRF guard, and user-scoped credential encryption.

Performance / Loadbalancing / Reliability improvements

Routing Groups (per-model strategies)
New router_settings.routing_groups schema binds a list of model_name s to its own routing_strategy and optional routing_strategy_args; ungrouped models fall back to the top-level routing_strategy (the implicit default group, name reserved). Each model_name may belong to at most one group — overlap raises ValueError at init. Updatable at runtime via Router.update_settings(routing_groups=[...]) or /config/update; per-group state is rebuilt on update - PR #27022

Database reconnect
Prisma reconnect no longer blocks the asyncio event loop. Replaces await self.db.disconnect() (which calls subprocess.Popen.wait() synchronously and freezes the loop for 30–120 s+ in production, failing K8s liveness probes) with SIGTERM → 0.5 s sleep → SIGKILL → fresh Prisma() + connect(). Direct-reconnect path delegates to recreate_prisma_client - PR #26225

call_with_db_reconnect_retry helper centralizes the reconnect-and-retry-once pattern. Restores the self-heal that 1.83.x lost on PrismaClient.get_generic_data (issue #25143) and harden the reconnect state machine - PR #26756

Redis IAM token caching
GCP IAM token is no longer regenerated on every Redis connection; a single Redis INCRBYFLOAT was taking 25.6 s on a 28.4 s trace in production - PR #26441

Config caching
DualCache config parameter reads are cached and batched. End-to-end on Docker, read load drops from 2.8 q/s to 0.7 q/s; improvement scales with pod count. Note: config edits will take longer to propagate (until the cache is invalidated) - PR #26469

Memory footprint
Lazy-loaded feature routers - PR #26534

Lazy-loaded front page + openapi.json move-to-CI - PR #26802

Connection layer
Optional TCP SO_KEEPALIVE support on aiohttp's TCPConnector - PR #26730

CLI
--timeout_worker_healthcheck flag for uvicorn worker triage (see Management Endpoints) - PR #26622

Test stability
Scope test_model_alias_map ERROR-log assertion to LiteLLM logger so asyncio records (e.g. Unclosed client session) stop flunking the assertion intermittently - PR #26741

Replace lazy-load subprocess startup-import diff with static source scan (~13 s instead of timing out past two minutes) - PR #26934

Opt model-access E2E tests into allow_client_mock_response: true after the request-control hardening - PR #26941

General Proxy Improvements

CI / Tooling

Support CircleCI "Rerun failed tests" for local_testing_part1 / local_testing_part2 / litellm_router_testing jobs (was collecting 0 items + exit 123) - PR #26461

Correct min-release-age value in .npmrc files: drop the d suffix to keep npm install from crashing on npm 11.x with RangeError: Invalid time value - PR #26850

Pull request template

Add Linear ticket field for internal contributors - PR #26655

New Contributors

@xinrui-z made their first contribution in #24294

@Jerry-SDE made their first contribution in #25249

@Zerohertz made their first contribution in #25888

@clyang made their first contribution in #26011

@mverrilli made their first contribution in #26122

@tuhinspatra made their first contribution in #26262

@omriShukrun08 made their first contribution in #26605

@lmcdonald-godaddy made their first contribution in #26651

@minznerjosh made their first contribution in #26710

@yassinkortam made their first contribution in #26730

@sruthi-sixt-26 made their first contribution in #26814

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.83.14-stable...v1.84.0
Original source
Apr 27, 2026
Date parsed from source:
Apr 27, 2026

First seen by Releasebot:
May 23, 2026
liteLLM

v1.83.14 - GPT-5.5, Prompt Compression & Memory API

liteLLM adds GPT-5.5 and GPT-5.5 Pro support, server-side prompt compression, new memory CRUD endpoints, LLM-as-a-Judge guardrails, MCP OAuth hardening, per-member team budgets, and adaptive routing, alongside broad model updates, bug fixes, and proxy reliability improvements.
Key Highlights

Day-0 GPT-5.5 and GPT-5.5 Pro support — OpenAI and Azure variants ship with full pricing maps, dated snapshots, and Responses-mode routing for the Pro tier.

Server-side Prompt Compression — first-class proxy callback that transparently compresses long-context inputs (Claude Code, RAG, document workloads) before they hit the upstream model, with no client opt-in required.

/v1/memory CRUD endpoints — proxy now exposes a memory store API with Prisma-backed metadata, consumed by the new agent loop.

LLM-as-a-Judge guardrail — model-graded post-call guardrail with configurable rubrics, joining the Bedrock / Lakera / Presidio / Noma family.

MCP OAuth hardening — discoverable + BYOK authorize/token endpoints are tightened, temporary OAuth sessions are now shared across proxy instances via Redis, and per-server access policy is uniformly enforced across the proxy and broker.

Per-member team budgets land in production — individual member budgets, per-member cycle surfacing in the Teams UI, and atomic counter alignment for user/org spend checks.

Adaptive routing — opt-in router policy that weights deployments by recent latency/error history on top of the existing wildcard fallback.

New Models / Updated Models (22 new models) include OpenAI GPT-5.5 and GPT-5.5 Pro variants with 1,050,000 token context windows and updated pricing, Azure OpenAI variants, AWS Bedrock models, Moonshot, OpenRouter, Gemini embeddings, DashScope image generation, and others.

Features include Bedrock additions (GLM-5, Minimax M2.5, Claude Mythos Preview), OpenAI versioned GPT-5.4 mini/nano snapshots and GPT-5.5 support, Azure OpenAI dated variants, Gemini Embedding 2 GA, Vertex AI multi-region hosts, DashScope image generation support, Moonshot model registry additions, Anthropic model migrations, and general improvements such as migrating 38 models from legacy max_tokens to max_input_tokens/max_output_tokens.

Bug Fixes cover Anthropic input args preservation, Gemini thought suffix stripping, file content block handling, Azure streaming role preservation, Bedrock content block sorting and pricing fixes, Gemini embedding request filtering, Vertex AI dimension forwarding, Zhipu/GLM finish_reason mapping, OVHcloud tool calling fix, Scaleway audio support, Responses API normalization, Anthropic Messages API logging preservation, Image API multipart enforcement and URL fetch alignment, Vector Stores BYOK key injection restoration and permission respect, Memory API metadata JSONification, and general URL construction hardening.

Management Endpoints/UI improvements include virtual keys/auth enhancements, UI tab additions and toggles, sortable columns, per-member budget cycle surfacing, project management refactor, and bug fixes tightening authorization and metadata handling.

AI Integrations improvements include logging additions (litellm_call_id, Vertex AI passthrough logs) and guardrails enhancements (Bedrock OUTPUT source usage, post-call log deduplication, hook mode redaction, LLM-as-a-Judge guardrail shipping, team/global policy guardrails, guardrail param handling, streaming post-call logging, and deferred success log suppression).

Spend Tracking, Budgets and Rate Limiting updates include per-member budgets, rate limiting reseed enforcement, and budget window reset fixes.

MCP Gateway improvements include OAuth hardening, session sharing via Redis, access control alignment, permission resolution, route splitting, and tool filtering.

Performance, Loadbalancing, and Reliability improvements include adaptive routing, wildcard fallback enhancements, server-side prompt compression callback, health/readiness fix, and developer ergonomics with uvicorn hot reload flag.

General Proxy Improvements cover build/docker streamlining, migration opt-in resolver, CI/infra migrations and cleanups, test stability improvements, packaging/dependency bumps, UI fetch button fixes, and miscellaneous code improvements.

Documentation Updates include observability integration additions, proxy docs clarifications, Gemini 3 defaults and release notes, fenced code block padding alignment, prompt caching doc updates, and repository pointing.

New Contributors section lists multiple first-time contributors with links to their contributions.

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.83.10-stable...v1.83.14-stable
Original source
Apr 27, 2026
Date parsed from source:
Apr 27, 2026

First seen by Releasebot:
May 23, 2026
liteLLM

v1.83.10 - Claude Opus 4.7, Prompt Compression & Multi-Window Budgets

liteLLM adds day-0 Claude Opus 4.7 support, launches BM25-based prompt compression, and expands budgeting with multi-threshold alerts and concurrent budget windows. It also introduces PromptGuard guardrails, per-team guardrail opt-out, and a switch to uv packaging for faster builds.
Key Highlights

Claude Opus 4.7 day-0 support — Opus 4.7 across Anthropic, Bedrock, Vertex AI, Azure AI, and Perplexity, with reasoning, vision, prompt caching, computer use, and 1M-token context.

litellm.compress() — BM25-based prompt compression with a retrieval tool for trimming long context before it hits the model.

Multi-Threshold Budget Alerts — virtual keys can fire alerts at multiple configurable spend thresholds (e.g. 50% / 80% / 95%) instead of a single soft-budget level.

Concurrent Budget Windows — keys and teams can run multiple budget periods (daily + monthly) simultaneously, each with its own reset cadence.

Per-Team Guardrail Opt-Out — teams can opt out of specific global guardrails from team settings without touching config files.

PromptGuard Guardrail Integration — first-class pre/post-call guardrail for prompt-injection detection.

uv Packaging Migration — Poetry replaced by uv across packaging, CI, and Docker for faster, reproducible builds.

Breaking Changes

Caller-supplied tags are stripped unless the key/team opts in

What changed: Tags supplied by the caller — metadata.tags, litellm_metadata.tags, root-level tags, and the x-litellm-tags header — are stripped from the request before tag-based routing and tag-based spend attribution run, unless the calling key or its parent team carries metadata.allow_client_tags: true. Tags configured on the model deployment, key metadata, or team metadata are unaffected. The proxy logs a WARNING line on each strip:

Stripped caller-supplied tags from metadata, tags (root): this key/team does not have allow_client_tags: true in its metadata. Set it to opt into client-supplied routing/budget tags.

— PR #25905

Who is affected: Any deployment that relied on clients passing tags in the request body or x-litellm-tags header for tag-based cost tracking, tag budgets, or tag-based routing. After upgrade, those tags will silently fall through to the default bucket / default deployment, and per-tag spend reports will appear empty.

Restore prior behavior: Set allow_client_tags: true in the metadata of the affected key (or the team owning it). Either flag is sufficient — if the key or its parent team carries the flag, caller-supplied tags pass through.

# Per key curl -L -X POST 'http://0.0.0.0:4000/key/generate' \ -H 'Authorization: Bearer sk-1234' \ -H 'Content-Type: application/json' \ -d '{"metadata": {"allow_client_tags": true}}' # Per team curl -L -X POST 'http://0.0.0.0:4000/team/new' \ -H 'Authorization: Bearer sk-1234' \ -H 'Content-Type: application/json' \ -d '{"metadata": {"allow_client_tags": true}}'

Existing keys/teams can be patched with /key/update or /team/update carrying the same metadata payload.

os.environ/… values in the UI or API

What changed: Values such as os.environ/OPENAI_API_KEY (and other os.environ/… patterns) are no longer expanded when they come from request-supplied fields—including the Admin UI and the same proxy APIs the UI calls. — PR #25592

Who is affected: Anyone who entered literal os.environ/SECRET_NAME strings in the UI or API and expected the proxy to substitute the host environment at runtime.

What to use instead: Provider API keys and similar secrets should be stored with Reusable Credentials and attached to models (for example via litellm_credential_name). For observability callbacks (Langfuse, LangSmith, etc.), set keys and endpoints in proxy config.yaml or in environment variables the process reads at startup—not as os.environ/… strings inside per-request metadata.

New Models / Updated Models

New Model Support (10 new models)
Provider Model Context Window Input ($/1M tokens) Output ($/1M tokens) Features Anthropic claude-opus-4-7, claude-opus-4-7-20260416 1M $5.00 $25.00 Chat, reasoning, vision, computer use, prompt caching, PDF input, xhigh reasoning effort AWS Bedrock anthropic.claude-opus-4-7, us.anthropic.claude-opus-4-7, eu.anthropic.claude-opus-4-7, au.anthropic.claude-opus-4-7, global.anthropic.claude-opus-4-7 1M $5.50 $27.50 Chat, reasoning, vision, computer use, prompt caching, PDF input, native structured output Vertex AI vertex_ai/claude-opus-4-7, vertex_ai/claude-opus-4-7@default 1M $5.00 $25.00 Chat, reasoning, vision, computer use, prompt caching, PDF input Azure AI azure_ai/claude-opus-4-7 200K $5.00 $25.00 Chat, reasoning, vision, computer use, prompt caching, PDF input Perplexity perplexity/anthropic/claude-opus-4-7 - - - Web search, function calling (Responses mode) Google Gemini gemini/veo-3.1-lite-generate-preview 1024 - $0.05 / sec Video generation preview OpenRouter openrouter/google/gemini-3.1-flash-lite-preview 1.05M $0.25 $1.50 Chat, code execution, file search, function calling, prompt caching, reasoning, web search, vision, video/audio/PDF input xAI xai/grok-4.20-0309-reasoning 2M $2.00 $6.00 Function calling, reasoning, tool choice, vision, web search W&B Inference wandb/MiniMaxAI/MiniMax-M2.5 197K $0.30 $1.20 Function calling, reasoning, response schema W&B Inference wandb/moonshotai/Kimi-K2.5 262K $0.60 $3.00 Function calling, reasoning, response schema, vision
Features

Anthropic

Day-0 support for Claude Opus 4.7 across Anthropic native, Bedrock, Vertex AI, Azure AI, and Perplexity - PR #25867

Hotfix follow-ups for Opus 4.7 routing/version-string handling - PR #25875, PR #25876

Retry /v1/messages after invalid thinking signature errors - PR #25674

AWS Bedrock

Normalize custom tool JSON schema for both Invoke and Converse APIs - PR #25396

Bedrock API response null-type handling - PR #25810, PR #24147

Prevent negative streaming costs for start-only cache usage - PR #25846

Accurate cache token cost breakdown in UI and SpendLogs - PR #25735

Remove unresolved merge conflict markers in Bedrock test file - PR #25995

Replace flaky Bedrock gpt-oss tool-call live test with request-body mock - PR #25739

Mock Bedrock Moonshot tests + fix TogetherAIConfig recursion - PR #25920

Remove dead Bedrock clear_thinking interleaved-thinking-beta assertion - PR #25913

Google Vertex AI

Normalize Gemini finish_reason enum through map_finish_reason - PR #25337

Add us-south1 region for vertex_ai/qwen3-235b-a22b-instruct-2507-maas - PR #25382

Add vertex_ai/claude-opus-4-7 and vertex_ai/claude-opus-4-7@default cost map entries - cost map

Google Gemini

Veo 3.1 Lite pricing, video resolution usage, and tiered cost tracking - PR #25348

Azure AI

Add azure_ai/claude-opus-4-7 cost map entry - cost map

Populate standard_logging_object for Azure passthrough via logging hook - PR #25679

OpenAI

Omit null encoding_format for OpenAI embedding requests - PR #25395 (later reverted in PR #25698)

xAI

Add xai/grok-4.20-0309-reasoning cost map entry - PR #25930

Together AI

Expose reasoning effort fields in get_model_info and add together_ai/gpt-oss-120b - PR #25263

Replace deprecated Mixtral with serverless Qwen3.5-9B in tests - PR #25728

DashScope

Preserve cache_control for explicit prompt caching - PR #25331

GitHub Copilot

Allow overriding the default GitHub Copilot authentication endpoint - PR #25915

W&B Inference

Add Kimi-K2.5 and MiniMax-M2.5 cost map entries - PR #25409

Bug Fixes

Anthropic

Return actual upstream status code from /v1/messages/count_tokens instead of always 200 - PR #21352

Vertex AI

Gemini finish_reason enum normalization (see Features above) - PR #25337

Embeddings API

Revert null-encoding_format omission after downstream regression - PR #25698

General

Fix version shown in docs banner - PR #25875

LLM API Endpoints

Add Responses API params to cache key allow-list - PR #25673

OCR API: Mistral-style pages param via Azure DI analyze query string - PR #25929

Add missing Mistral OCR params to allowlist - PR #25858

OpenAI encoding_format handling for null values (initial fix later reverted) - PR #25395, PR #25698

Anthropic Messages: Retry on invalid thinking signature - PR #25674

Return actual status code on count_tokens upstream errors - PR #21352

Pass-Through Endpoints: Populate standard_logging_object for Azure passthrough - PR #25679

Restrict x-pass- header forwarding for credential and protocol headers - PR #25916

Management Endpoints / UI

Virtual Keys

Configurable multi-threshold budget alerts (e.g. 50% / 80% / 95%) - PR #25989

Multiple concurrent budget windows per API key and team (#24883) - PR #25109

Per-member model scope + team default_team_member_models - PR #24950

Migrate regenerate key modal to AntD - PR #25406

Strip empty premium fields from key update payload - PR #26023

Default invite-user modal global role to least privilege - PR #25721

Teams

Allow editing router settings after team creation - PR #25398

Per-team opt-out for specific global guardrails - PR #25575

Enterprise notice banner on deleted Keys/Teams - PR #25814

Invalidate org queries after team mutations - PR #25812

E2E test for editing team model TPM/RPM limits - PR #25658

Models + Endpoints

Claude Code BYOK support in UI Settings - PR #25998

E2E tests for Add Model flow - PR #25590

Pre-select backend default for boolean guardrail provider fields - PR #25700

Render guardrail optional_params bool defaults in Select - PR #25806

Use AntD Select for MCP ToolTestPanel boolean inputs - PR #25809

Persist extra_headers on MCP server edit - PR #26003

Migrate Guardrail Test Playground from @tremor/react to AntD - PR #25749

Migrate router_settings page from Tremor to AntD - PR #25879

Reduce Tremor usage in Guardrails Monitor layout - PR #25803

Remove Chat UI link from Swagger docs message - PR #25727

Delete policy attachments via controlled modal - PR #25324

Auth / SSO

Resolve login redirect loop when reverse proxy adds HttpOnly to cookies - PR #23532

Gate post-custom-auth DB lookups behind opt-in flag - PR #25634

Logs / Activity

Isolate logs team-filter dropdown from root teams state bleed - PR #25716

Align /spend/logs filter handling with user scoping - PR #25594

Helm

Add tpl support to extraContainers and extraInitContainers - PR #25494

Bugs

Strip empty premium fields from key update payload - PR #26023

Tighten api_key value check in credential validation - PR #25917

extra_headers not persisting on MCP server edit - PR #26003

Logs team-filter dropdown leakage - PR #25716

Add getCookie to cookieUtils mock in user_dashboard test - PR #25719

Remove deprecated tests/ui_e2e_tests/ suite - PR #25657

Restrict x-pass- header forwarding - PR #25916

Blog dark-mode text invisible on dark background - PR #25620

Default invite-user role least-privilege - PR #25721

AI Integrations

Logging

Prometheus: Add 7m and 10m latency histogram buckets - PR #25071

Performance improvements for Prometheus exporter - PR #25934

Resolve prometheus_helpers file/package shadow breaking /global/spend/logs - PR #26026

Azure Pass-Through

Populate standard_logging_object via logging hook - PR #25679

General

Preserve provider response headers in StandardLoggingPayload - PR #25807

Guardrails

PromptGuard

New PromptGuard guardrail integration for prompt-injection detection - PR #24268

Custom Code Guardrails

Replace custom_code sandbox with RestrictedPython - PR #25818

Presidio

Use correct text positions in anonymize_text - PR #24998

General

Per-team opt-out for specific global guardrails - PR #25575

UI: pre-select backend default for boolean guardrail provider fields - PR #25700

UI: render guardrail optional_params boolean defaults in Select - PR #25806

Read guardrail config from admin metadata and fix tag-routing consistency - PR #25905

Caching

Add Responses API params to cache key allow-list - PR #25673

Prevent multiple values TypeError in get_cache_key - PR #20261

S3v2: use prepared URL for SigV4-signed S3 requests - PR #25074

Prompt Management / Compression

New litellm.compress() BM25-based prompt compression API with retrieval tool - PR #25637

Secret Managers

No new secret manager provider additions in this release.

Spend Tracking, Budgets and Rate Limiting

Configurable multi-threshold budget alerts for virtual keys (e.g. 50% / 80% / 95%) - PR #25989

Multiple concurrent budget windows per API key and team (#24883) - PR #25109

Bedrock/Anthropic accurate cache token cost breakdown in UI and SpendLogs - PR #25735

Bedrock: prevent negative streaming costs for start-only cache usage - PR #25846

Fix virtual-key projected-spend soft budget alerts - PR #25838

Enforce project-level model-specific rate limits in parallel-request limiter - PR #25994

Persist default router end-budget across restarts - PR #25991

Align reset times for legacy entities (Team Members, End Users) with the standardized calendar - PR #25440

Batch-limit stale managed-object cleanup to prevent 300K-row UPDATE - PR #25227

Cache invalidation: stop double-hashing token in bulk update and key rotation - PR #25552

model_max_budget silently broken for routed models - PR #25549

Expose reasoning-effort fields in get_model_info (and add together_ai/gpt-oss-120b to cost map) - PR #25263

Veo 3.1 Lite resolution-aware tiered cost tracking - PR #25348

Add us-south1 region for Vertex qwen3-235b-a22b-instruct-2507-maas cost map - PR #25382

MCP Gateway

Validate is_tool_name_prefixed against the set of known MCP server prefixes - PR #25085

Restore PKCE-triggering 401 when no stored per-user token exists - PR #26032

Expose per-server InitializeResult.instructions from the MCP gateway - PR #25694

Extract shared PKCE helpers into utils/pkce.ts - PR #25878

UI: AntD Select for MCP ToolTestPanel boolean inputs - PR #25809

UI: persist extra_headers on MCP server edit - PR #26003

Performance / Loadbalancing / Reliability improvements

Prometheus exporter performance improvements - PR #25934

Optimize DB query to prevent OOM during health checks - PR #25732

PodLockManager.release_lock atomic compare-and-delete (re-land of #21226) - PR #24466

Health-check reasoning-token max-token precedence - PR #25936

New BACKGROUND_HEALTH_CHECK_MAX_TOKENS environment variable - PR #25344

Return None for routing_strategy_args when strategy is not latency-based - PR #25882

Bump proxy dependencies; raise minimum Python to 3.10 - PR #26022

Bump 22 of 25 vulnerable dependabot-reported dependencies - PR #25442

Migrate packaging, CI, and Docker from Poetry to uv - PR #25007

[Infra] Bump llm_translation_testing resource class to xlarge and tolerate worker restarts - PR #25887, PR #25898

[Infra] Expand CI branch filters for non-main PR targets - PR #25819

[Infra] Guard main to only accept PRs from staging and hotfix branches - PR #25733

[Infra] Remove unused publish_proxy_extras and prisma_schema_sync jobs from CircleCI config - PR #25821

fix(ci): increase test-server-root-path timeout to 30m - PR #25741

Remove non-existent litellm_mcps_tests_coverage from coverage combine - PR #25737

Helm: add tpl support to extraContainers/extraInitContainers - PR #25494

Advisor tool orchestration loop for non-Anthropic providers - PR #25579

Documentation Updates

Cost discrepancy debugging guide - PR #25622

Week 2 onboarding checklist - PR #25452

Add "Copy Page as Markdown" + llms.txt to docs site - PR #25975

Docs announcement bar for Trivy compromise resolution - PR #25870

Restyle docs.litellm.ai/blog to engineering blog aesthetic - PR #25580

Ramp-style engineering blog restyle + Redis circuit breaker post - PR #25583

Add back arrow to blog post pages - PR #25587

Fallbacks image - PR #25731

General docs update - PR #25736

Backfill release notes for v1.83.3-stable and v1.83.7.rc.1 - PR #25723, PR #25726

Fix version shown in docs - PR #25875

New Contributors

@hunterchris made their first contribution in https://github.com/BerriAI/litellm/pull/20261

@Dmitry-Kucher made their first contribution in https://github.com/BerriAI/litellm/pull/24998

@kulia26 made their first contribution in https://github.com/BerriAI/litellm/pull/25071

@jaxhend made their first contribution in https://github.com/BerriAI/litellm/pull/23532

@abhyudayareddy made their first contribution in https://github.com/BerriAI/litellm/pull/25337

@avarga1 made their first contribution in https://github.com/BerriAI/litellm/pull/25263

@acebot712 made their first contribution in https://github.com/BerriAI/litellm/pull/24268

@meutsabdahal made their first contribution in https://github.com/BerriAI/litellm/pull/25395

@shreyescodes made their first contribution in https://github.com/BerriAI/litellm/pull/25559

@Lucas-Song-Dev made their first contribution in https://github.com/BerriAI/litellm/pull/25324

@steromano87 made their first contribution in https://github.com/BerriAI/litellm/pull/25915

@jlav made their first contribution in https://github.com/BerriAI/litellm/pull/25494

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.83.7-stable...v1.83.10-stable
Original source
Mar 16, 2026
Date parsed from source:
Mar 16, 2026

First seen by Releasebot:
May 23, 2026
liteLLM

v1.82.3 - Nebius AI, gpt-5.4, Gemini 3.x, FLUX Kontext, and 116 New Models

liteLLM releases major model and provider expansion with Nebius AI, SageMaker Nova, and Black Forest Labs support, plus day-0 OpenAI gpt-5.4 routing, Gemini 3.x updates, WebSocket streaming for Responses API, stronger RBAC, Vault integration, and secret redaction.
Key Highlights

Nebius AI — new provider — 30 models across DeepSeek, Qwen, Llama, Mistral, NVIDIA, and BAAI available via Nebius AI cloud - PR #22614

OpenAI gpt-5.4 / gpt-5.4-pro — day 0 — Full pricing and routing support for gpt-5.4 (1M context, $2.50/$15.00) and gpt-5.4-pro ($30.00/$180.00) on OpenAI and Azure

Gemini 3.x models — gemini-3-flash-preview, gemini-3.1-pro-preview, gemini-3.1-flash-image-preview, and gemini-embedding-2-preview added to cost map for Google AI and Vertex AI

FLUX Kontext image editing — flux-kontext-pro and flux-kontext-max added to Black Forest Labs, alongside flux-pro-1.0-fill and flux-pro-1.0-expand for inpainting and outpainting

116 new models, 132 deprecated models cleaned up — Major model map refresh including Mistral Magistral, Dashscope Qwen3 VL, xAI Grok via Azure AI, ZAI GLM-5, Serper Search; removal of OpenAI GPT-3.5/GPT-4 legacy variants, Gemini 1.5, and Vertex AI PaLM2

SageMaker Nova provider — New sagemaker_nova provider for Amazon Nova models on SageMaker - PR #21542

Hashicorp Vault secret manager — Config override backend powered by Hashicorp Vault, with full UI for managing vault-sourced credentials - PR #22939, PR #23036

Responses API WebSocket streaming — Real-time WebSocket streaming for the Responses API, including support across all providers - PR #22559, PR #22771

Org Admin RBAC expansion — Org Admins can now access team management endpoints, view and invite internal users, and manage team membership without requiring a global admin role - PR #23085, PR #23080

Guardrail mode defaults and tag-based modes — Set a default guardrail mode list globally, and specify a list of modes in tag-based guardrail configs - PR #22676, PR #23020

Secret redaction in logs — API keys, tokens, and credentials automatically scrubbed from all proxy log output. Enabled by default; opt out with LITELLM_DISABLE_REDACT_SECRETS=true - PR #23668

Streaming stability fix — Critical fix for RuntimeError: Cannot send a request, as the client has been closed. crashes after ~1 hour in production - PR #22926

New Providers and Endpoints

New Providers (7 new providers)

Provider Supported LiteLLM Endpoints Description

Nebius AI (nebius/) /chat/completions, /embeddings EU-based AI cloud with 30+ open models — DeepSeek, Qwen3, Llama 3.1/3.3, NVIDIA Nemotron, BAAI embeddings
ZAI (zai/) /chat/completions ZhipuAI GLM-5 models via ZAI cloud
Black Forest Labs (black_forest_labs/) /images/generations, /images/edits FLUX image generation and editing — Kontext Pro/Max, Pro 1.0 Fill/Expand
Serper (serper/) /search Web search via Serper API
SageMaker Nova (sagemaker_nova/) /chat/completions Amazon Nova models via SageMaker endpoint
Google Search API (google_search/) /search Google Search API integration - PR #22752
Bedrock Mantle (bedrock_mantle/) /chat/completions Amazon Bedrock via Mantle — alternative auth and routing path for Bedrock models - PR #22866

New Models / Updated Models

New Model Support (116 new models)

Includes OpenAI gpt-5.4 and gpt-5.4-pro, Google Gemini 3.x variants, Mistral Magistral models, Dashscope Qwen3 VL models, Black Forest Labs FLUX Kontext image editing models, Azure AI xAI Grok models, and many more.

Updated Models

AWS Bedrock: Added prompt caching cost estimation for Anthropic models; renamed regional identifiers.
Azure OpenAI: Added supports_none_reasoning_effort to gpt-5.1-chat, gpt-5.1-codex, and gpt-5.4 variants; removed deprecated models azure/gpt-35-turbo-0301 and azure/gpt-35-turbo-0613.

Features

OpenAI: Day 0 support for gpt-5.4 and gpt-5.4-pro on OpenAI and Azure.
Google Gemini: Added Gemini 3.x model cost map entries and re-added Gemini 2.0 Flash and Flash Lite with updated pricing.
Google Vertex AI: Added Gemini 3.x models to cost map.
Mistral: Added Magistral reasoning models and other variants.
Dashscope / Qwen: Added Qwen3 VL multimodal models and other Qwen3 variants.
Black Forest Labs: Added FLUX Kontext image editing models and FLUX Pro 1.0 Fill/Expand.
Azure AI: Added xAI Grok models and Mistral Document AI OCR mode.
AWS Bedrock: Added new models via Bedrock Converse.
SageMaker: Added sagemaker_nova provider for Amazon Nova models on SageMaker.

Bugs

Fixed various issues including streaming finish_reason for tool calls, JSON schema preservation for Gemini 2.0+, handling of reasoning_effort param, content truncation in streaming, and more.

AI Integrations

Added Gemini and Vertex AI support to HeliconeLogger, fixed provider URLs, Langfuse failure path fixes, Vantage integration for FOCUS 1.2 CSV export, and general fixes.

Guardrails

Configured default guardrail mode lists globally and tag-based guardrail mode lists; fixed presidio PII token leak and OTEL orphaned guardrail traces.

Secret Managers

Full Hashicorp Vault integration as a config override backend with UI support for managing vault-sourced credentials.

MCP Gateway

Added token authentication for MCP servers, team-scoped MCP server filtering, and per-server health recheck in UI.

Spend Tracking, Budgets and Rate Limiting

Fixed budget-linked keys never having spend reset, added flex pricing support, fixed spend log cleanup and deduplication, and fixed TypeError when request has no API key.

Performance / Loadbalancing / Reliability improvements

Fixed streaming crashes after ~1 hour, OOM / Prisma connection loss on large installs, centralized logging kwarg updates, fixed tiktoken cache for non-root offline containers, and other reliability improvements.

Security

Added secret redaction in proxy logs, bumped PyJWT to ^2.12.0, and updated tar and tornado to address CVEs.

Database / Proxy Operations

Fixed Prisma migrate deploy on pre-existing instances and made DB migration failure exit opt-in.

Documentation Updates

Added Anthropic /v1/messages → /responses parameter mapping reference, updated Okta SSO docs, added environment variables reference, and added Gemini Vertex AI PayGo/priority cost tracking docs.

New Contributors

Several new contributors made their first contributions in this release.

Diff Summary

New Providers: 7
New Models / Updated Models: 116 new, 132 removed
LLM API Endpoints: 37
Management Endpoints / UI: 31
AI Integrations: 8
MCP Gateway: 5
Spend Tracking, Budgets and Rate Limiting: 5
Performance / Loadbalancing / Reliability improvements: 9
Security: 3
Database / Proxy Operations: 2
Documentation Updates: 5
Original source
Feb 28, 2026
Date parsed from source:
Feb 28, 2026

First seen by Releasebot:
May 23, 2026
liteLLM

v1.82.0 - Realtime Guardrails, Projects Management, and 10+ Performance Optimizations

liteLLM adds realtime API guardrails, a new Projects management UI, expanded guardrail policies, and day 0 support for OpenAI gpt-5.3-codex. It also routes /v1/messages to the Responses API by default, improves performance, and broadens model coverage across providers.
Key Highlights

Realtime API guardrails — Full guardrails support for /v1/realtime WebSocket sessions with pre/post-call enforcement, voice transcription hooks, session termination policies, and Vertex AI Gemini Live support - PR #22152, PR #22153, PR #22161, PR #22165

Projects Management — New Projects UI with full CRUD, project-scoped virtual keys, and admin opt-in toggle — organize teams and keys by project - PR #22315, PR #22360, PR #22373, PR #22412

Guardrail ecosystem expansion — Noma v2, Lakera v2 post-call, Singapore regulatory policies (PDPA + MAS), employment discrimination blockers, code execution blocker, guardrail policy versioning, and production monitoring - PR #21400, PR #21783, PR #21948

OpenAI Codex 5.3 — day 0 — Full support for gpt-5.3-codex on OpenAI and Azure, plus gpt-audio-1.5 and gpt-realtime-1.5 model coverage - PR #22035

10+ performance optimizations — Streaming hot-path fixes, Redis pipeline batching, database task batching, ModelResponse init skip, and router cache improvements — lower latency and CPU on every request

/v1/messages→/responses routing — /v1/messages requests are now routed to the Responses API by default for OpenAI/Azure models

This version starts routing /v1/messages requests to the /responses API by default. To opt out and continue using chat/completions, set LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true or litellm_settings.use_chat_completions_url_for_anthropic_messages: true in your config.

New Models / Updated Models

New Model Support (20 new models)

Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features
OpenAI | gpt-5.3-codex | 272K | $1.75 | $14.00 | Reasoning, coding
Azure OpenAI | azure/gpt-5.3-codex | 272K | $1.75 | $14.00 | Azure deployment
OpenAI | gpt-audio-1.5 | 128K | $2.50 | $10.00 | Audio model
Azure OpenAI | azure/gpt-audio-1.5-2026-02-23 | 128K | $2.50 | $10.00 | Audio model
OpenAI | gpt-realtime-1.5 | 32K | $4.00 | $16.00 | Realtime model
Azure OpenAI | azure/gpt-realtime-1.5-2026-02-23 | 32K | $4.00 | $16.00 | Realtime model
Groq | groq/openai/gpt-oss-safeguard-20b | 131K | $0.075 | $0.30 | Guardrail inference
Google Vertex AI | vertex_ai/gemini-3.1-flash-image-preview | - | - | - | Image generation
Perplexity | perplexity/perplexity/sonar | - | - | - | Sonar search
Perplexity | perplexity/openai/gpt-5.1 | - | - | - | Hosted routing
Perplexity | perplexity/openai/gpt-5-mini | - | - | - | Hosted routing
Perplexity | perplexity/google/gemini-2.5-flash | - | - | - | Hosted routing
Perplexity | perplexity/google/gemini-2.5-pro | - | - | - | Hosted routing
Perplexity | perplexity/google/gemini-3-flash-preview | - | - | - | Hosted routing
Perplexity | perplexity/google/gemini-3-pro-preview | - | - | - | Hosted routing
Perplexity | perplexity/anthropic/claude-haiku-4-5 | - | - | - | Hosted routing
Perplexity | perplexity/anthropic/claude-sonnet-4-5 | - | - | - | Hosted routing
Perplexity | perplexity/anthropic/claude-opus-4-5 | - | - | - | Hosted routing
Perplexity | perplexity/anthropic/claude-opus-4-6 | - | - | - | Hosted routing
Perplexity | perplexity/xai/grok-4-1-fast-non-reasoning | - | - | - | Hosted routing

Features

OpenAI

Day 0 support for gpt-5.3-codex on OpenAI and Azure - PR #22035

Add gpt-audio-1.5 model cost map - PR #22303

Add gpt-realtime-1.5 model cost map - PR #22304

Add audio as supported OpenAI param - PR #22092

Add prompt_cache_key and prompt_cache_retention support - PR #20397

Azure OpenAI

New Azure OpenAI models 2026-02-25 - PR #22114

Anthropic

Add v1 Anthropic Responses API transformation - PR #22087

Sanitize tool_use IDs in convert_to_anthropic_tool_invoke - PR #21964

Fix model wildcard access issue - PR #21917

AWS Bedrock

Encode model ARNs for OpenAI-compatible Bedrock imported models - PR #21701

Support optional regional STS endpoint in role assumption - PR #21640

Native structured outputs API support - PR #21222

Google Vertex AI

Add gemini-3.1-flash-image-preview to model cost map - PR #22223

Enable context-1m-2025-08-07 beta header for Vertex AI provider - PR #21867

OpenRouter

Add OpenRouter native models to model cost map - PR #20520

Add OpenRouter Opus 4.6 to model map - PR #20525

Mistral

Adjust mistral-small-2503 input/output cost per token - PR #22097

Groq

Add groq/openai/gpt-oss-safeguard-20b model pricing - PR #21951

AI/ML

Update AIML model pricing - PR #22139

Ollama

Thread api_base to get_model_info + graceful fallback - PR #21970

PublicAI

Fix function calling for PublicAI Apertus models - PR #21582

xAI

Add deprecation dates for grok-2-vision-1212 and grok-3-mini models - PR #20102

General

Forward auth headers of provider - PR #22070

Normalize camelCase thinking param keys to snake_case - PR #21762

Allow dimensions param passthrough for non-text-embedding-3 OpenAI models - PR #22144

Bug Fixes

AWS Bedrock

Fix converse handling for parallel_tool_calls - PR #22267

Restore parallel_tool_calls mapping in map_openai_params - PR #22333

Correct modelInput format for Converse API batch models - PR #21656

Prevent double UUID in create_file S3 key - PR #21650

Filter internal json_tool_call when mixed with real tools - PR #21107

Pass timeout param to Bedrock rerank HTTP client - PR #22021

Anthropic

Fix model cost map for anthropic fast and inference_geo - PR #21904

Image Generation

Propagate extra_headers to upstream image generation - PR #22026

Add ChatCompletionImageObject in OpenAIChatCompletionAssistantMessage - PR #22155

General

Preserve forwarding of server-side called tools - PR #22260

Fix free model handling from UI paths - PR #22258

Fix None TypeError in mapping - PR #22080

LLM API Endpoints

Features

Realtime API

Guardrails support for /v1/realtime WebSocket endpoint - PR #22152

Vertex AI Gemini Live via unified /realtime endpoint - PR #22153

Guardrails with pre_call/post_call mode on realtime WebSocket - PR #22161

end_session_after_n_fails + Endpoint Settings wizard step - PR #22165

Guardrail hook for voice transcription - PR #21976

Fix guardrails not firing for Gemini/Vertex AI and provider_config realtime sessions - PR #22168

Add logging, spend tracking support + tool tracing - PR #22105

Video Generation

Add variant parameter to video content download - PR #21955

Pass api_key from litellm_params to video remix handlers - PR #21965

Apply custom video pricing from deployment model_info - PR #21923

Fix passing of image and parameters in videos API - PR #22170

OCR

Enable local file support for OCR - PR #22133

Websearch / Tool Calling

Preserve thinking blocks in agentic loop follow-up messages - PR #21604

General

Add configurable upper bound for chunk processing time - PR #22209

Emit x-litellm-overhead-duration-ms header for streaming requests - PR #22027

Bugs

General

Fix mypy attr-defined errors on realtime websocket calls - PR #22202

Management Endpoints / UI

Features
Projects

Add Projects page with list and create flows - PR #22315

Add Project Details page with edit modal - PR #22360

Add project keys table and project dropdown on key create/edit - PR #22373

Add delete project action to Projects table - PR #22412

Add Projects Opt-In Toggle in Admin Settings - PR #22416

Include created_at and updated_at in /project/list response - PR #22323

Add tags in project - PR #22216

Virtual Keys + Access Groups

Add bidirectional team/key sync for Access Group CRUD flows - PR #22253

Add pagination and search to /key/aliases to prevent OOMs - PR #22137

Add paginated key alias selector in UI - PR #22157

Add project_id and access_group_id filters for key list endpoint - PR #22356

Add KeyInfoHeader component - PR #22047

Restrict Edit Settings to key owners - PR #21985

Fix virtual key grace period from env/UI - PR #20321

Agents

Assign virtual keys to agents - PR #22045

Assign tools to agents - PR #22064

Ensure internal users cannot create agents (RBAC enforcement) - PR #22329

Proxy Auth / SSO

OIDC discovery URLs, roles array handling, and dot-notation error hints - PR #22336

Add PROXY_ADMIN role to system user for key rotation - PR #21896

Usage / Spend Logs

Add user filtering to usage page - PR #22059

Allow using AI to understand usage patterns - PR #22042

Use backend request_duration_ms and make Duration sortable in Logs - PR #22122

Add request_duration_ms to SpendLogs - PR #22066

Enrich failure spend logs with key/team metadata - PR #22049

Show real tool names in logs for Anthropic-format tools - PR #22048

Models + Endpoints

Show proxy URL in ModelHub - PR #21660

Add /public/endpoints for provider endpoint support - PR #22248

UI Improvements

Add custom favicon support - PR #21653

Add Blog Dropdown in Navbar - PR #21859

Add UI banner warning for detailed debug mode - PR #21527

Make auth value optional for MCP Server create flow - PR #22119

Tool policies: auto-discover tools + policy enforcement guardrail - PR #22041

Health Checks

Add health check max tokens configuration - PR #22299

Limit concurrent health checks with health_check_concurrency - PR #20584

Fix health check model_id filtering - PR #21071

Bugs

Populate user_id and user_info for admin users in /user/info - PR #22239

Fix virtual keys pagination stale totals when filtering - PR #22222

Fix Spend Update Queue aggregation never triggers with default presets - PR #21963

Fix timezone config lookup and replace hardcoded timezone map with ZoneInfo - PR #21754

Fix custom auth budget issue - PR #22164

Fix missing OAuth session state - PR #21992

Fix Transport Type for OpenAPI Spec on UI - PR #22005

Fix Claude Code plugin schema - PR #22271

Add missing migration for LiteLLM_ClaudeCodePluginTable - PR #22335

Only tag selected deployment in access group creation - PR #21655

State management fixes for CheckBatchCost - PR #21921

Remove duplicate antd import in ToolPolicies - PR #22107

AI Integrations

Logging
DataDog

Add ability to trace metrics in DataDog - PR #22103

Correlate LiteLLM call IDs with DataDog APM spans - PR #22219

Fix TTS metric emission issues - PR #20632

Prometheus

Add opt-in stream label on litellm_proxy_total_requests_metric - PR #22023

Fix team +Inf budgets in Prometheus metrics - PR #22243

Langfuse

Fix Langfuse OTEL trace issues - PR #21309

Arize Phoenix

Fix nested traces coexistence with OTEL callback - PR #22169

Slack

Add optional digest mode for Slack alert types - PR #21683

General

Fix Gemini trace ID missing in logging - PR #22077

Populate cache_read_input_tokens from prompt_tokens_details for OpenAI/Azure - PR #22090

Guardrails

Noma

Noma guardrails v2 based on custom guardrails framework - PR #21400

LakeraAI

Add Lakera v2 post-call hook with fixed PII masking - PR #21783

Presidio

Fix Presidio streaming and false positives - PR #21949

Fix Presidio streaming v3 reliability improvements - PR #22283

Prevent Presidio crash on non-JSON responses - PR #22084

Built-in Guardrails

Block code execution guardrail to prevent agents from executing code - PR #22154

Employment discrimination topic blockers for 5 protected classes - PR #21962

Claims agent guardrails (5 categories + policy template) - PR #22113

New code execution evaluation dataset - PR #22065

Tool policies: auto-discover tools + policy enforcement - PR #22041

Policy Templates

Singapore guardrail policies (PDPA + MAS AI Risk Management) - PR #21948

Prefix SG guardrail policy IDs with country code - PR #21974

Guardrail policy versioning - PR #21862

Guardrail Monitoring

Guardrail Monitor — measure guardrail reliability in production - PR #21944

Security

Fix unauthenticated RCE and sandbox escape in custom code guardrail - PR #22095

Prompt Management

No major prompt management changes in this release.

Secret Managers

No major secret manager changes in this release.

Spend Tracking, Budgets and Rate Limiting

Priority PayGo cost tracking for Gemini/Vertex AI - PR #21909

Add request_duration_ms to SpendLogs for latency tracking per request - PR #22066

Add in_flight_requests metric to /health/backlog + Prometheus - PR #22319

Enrich failure spend logs with key/team metadata - PR #22049

Add spend tracking lifecycle logging for debugging spend flows - PR #22029

Fix budget timezone config lookup and replace hardcoded timezone map with ZoneInfo - PR #21754

Fix Spend Update Queue aggregation never triggering with default presets - PR #21963

Avoid mutating caller-owned dicts in SpendUpdateQueue aggregation - PR #21742

Optimize old spendlog deletion cron job - PR #21930

Health check max tokens configuration - PR #22299

MCP Gateway

Pass MCP auth headers from request context to tool fetch for /v1/responses and /chat/completions - PR #22291

Default available_on_public_internet to true for MCP server behavior consistency - PR #22331

Clear error messages for IP filtering / no available tools - PR #22142

Strip stale mcp-session-id header to prevent 400 errors across proxy workers - PR #21417

Skip health check for MCP with passthrough token auth - PR #21982

Fix missing OAuth session state - PR #21992

Fix Transport Type for OpenAPI Spec on UI - PR #22005

Add e2e test for stateless StreamableHTTP behavior - PR #22033

Performance / Loadbalancing / Reliability improvements

Streaming & hot-path

Streaming latency improvements — 4 targeted hot-path fixes - PR #22346

Skip throwaway Usage() construction in ModelResponse.init - PR #21611

Optimize is_model_o_series_model with startswith - PR #21690

Use cached _safe_get_request_headers instead of per-request construction - PR #21430

Emit x-litellm-overhead-duration-ms header for streaming requests - PR #22027

Database & Redis

Batch 11 create_task() calls into 1 in update_database() - PR #22028

Redis pipeline spend updates for batched writes - PR #22044

Recover from prisma-query-engine zombie process - PR #21899

Optimize old spendlog deletion cron job - PR #21930

Router & caching

Add cache invalidation for _cached_get_model_group_info - PR #20376

Remove cache eviction close that kills in-use httpx clients - PR #22247

Store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings - PR #22143

Fix ensure_arrival_time set before calculating queue time - PR #21918

Connection management

Only set enable_cleanup_closed on aiohttp when required - PR #21897

Prometheus child_exit cleanup for gunicorn workers - PR #22324

Prometheus multiprocess cleanup - PR #22221

Limit concurrent health checks with health_check_concurrency - PR #20584

Isolate get_config failures from model sync loop - PR #22224

Other

Semantic cache: support configurable vector dimensions - PR #21649

Honor MAX_STRING_LENGTH_PROMPT_IN_DB from config env vars - PR #22106

Enhance MidStreamFallbackError to preserve original status code and attributes - PR #22225

Network mock utility for testing - PR #21942

Add missing return type annotations to iterator protocol methods in streaming_handler - PR #21750

Security

Fix critical/high CVEs in OS-level libs and NPM transitive dependencies - PR #22008

Fix unauthenticated RCE and sandbox escape in custom code guardrail - PR #22095

Remove hardcoded base64 string flagged by secret scanner - PR #22125

Documentation Updates

Add OpenAI Agents SDK tutorial with LiteLLM Proxy - PR #21221

Add OpenClaw integration tutorial - PR #21605

Add Google GenAI SDK tutorial (JS & Python) - PR #21885

Add Gollem Go agent framework cookbook example - PR #21747

Update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway - PR #21130

Add store_model_in_db release docs - PR #21863

Add Credential Usage Tracking docs - PR #22112

Add proxy request tags docs - PR #22129

Add trailing slash to /mcp endpoint URLs - PR #20509

Add pre-PR checklist to UI contributing guide - PR #21886

Replace Azure OpenAI key with mock key in docs - PR #21997

Add performance & reliability section to v1.81.14 release notes - PR #21950

Update v1.81.12-stable release notes to point to stable.1 - PR #22036

Add security vulnerability scan report to v1.81.14 release notes - PR #22385

New Contributors

@janfrederickk made their first contribution in PR #21660

@hztBUAA made their first contribution in PR #21656

@LeeJuOh made their first contribution in PR #21754

@WhoisMonesh made their first contribution in PR #21750

@trevorprater made their first contribution in PR #21747

@edwiniac made their first contribution in PR #21870

@stakeswky made their first contribution in PR #21867

@ta-stripe made their first contribution in PR #21701

@ron-zhong made their first contribution in PR #21948

@Arindam200 made their first contribution in PR #21221

@Canvinus made their first contribution in PR #21964

@nicolopignatelli made their first contribution in PR #21951

@MarshHawk made their first contribution in PR #20584

@gavksingh made their first contribution in PR #22106

@roni-frantchi made their first contribution in PR #22090

@noahnistler made their first contribution in PR #22133

@dylan-duan-aai made their first contribution in PR #21130

@rasmi made their first contribution in PR #22322

Diff Summary

02/28/2026

New Models / Updated Models: 26

LLM API Endpoints: 14

Management Endpoints / UI: 38

AI Integrations: 25

Spend Tracking, Budgets and Rate Limiting: 10

MCP Gateway: 8

Performance / Loadbalancing / Reliability improvements: 22

Security: 3

Documentation Updates: 14

Original source
May 2026
No date parsed from source.

First seen by Releasebot:
May 23, 2026
liteLLM

v1.81.14 - New Gateway Level Guardrails & Compliance Playground

liteLLM adds Guardrail Garden, Compliance Playground, and three new zero-cost built-in guardrails, while also shipping Admin UI model storage settings, day one Claude Sonnet 4.6 support, new API endpoints, and broad performance and reliability improvements.
Key Highlights

Guardrail Garden — Browse built-in and partner guardrails by use case — competitor blocking, topic filtering, GDPR, prompt injection, and more. Pick a template, customize it, attach it to a team or key.

Compliance Playground — Test any guardrail policy against your own traffic before it goes live. See precision, recall, and false positive rate — so you know how it'll behave in production.

3 new zero-cost built-in guardrails — Competitor name blocker, topic blocker, and insults filter — all gateway-level, <0.1ms latency, no external API, configurable per-team or key

Store Model in DB Settings via UI - Configure model storage directly in the Admin UI without editing config files or restarting the proxy—perfect for cloud deployments

Claude Sonnet 4.6 — day 0 — Full support across Anthropic and Vertex AI: reasoning, computer use, prompt caching, 200K context

20+ performance optimizations — Faster routing, lower logging overhead, reduced cost-calculator latency, and connection pool fixes — meaningfully less CPU and latency on every request

Guardrail Garden

AI Platform Admins can now browse built-in and partner guardrails from the Guardrail Garden. Guardrails are organized by use case — blocking financial advice, filtering insults, detecting competitor mentions, and more — so you can find the right one and deploy it in a few clicks.

3 New Built-in Guardrails

This release brings 3 new built-in guardrails that run directly on the gateway. This is great for AI Gateway Admins who need low latency, zero cost guardrails for their scenarios.

Denied Financial Advice — detects requests for personalized financial advice, investment recommendations, or financial planning

Denied Insults — detects insults, name-calling, and personal attacks directed at the chatbot, staff, or other people

Competitor Name Blocker — detects mentions of competitor brands in responses

These guardrails are built for production and on our benchmarks had a 100% Recall and Precision.

Store Model in DB Settings via UI

Previously, the store_model_in_db setting could only be configured in proxy_config.yaml under general_settings, requiring a proxy restart to take effect. Now you can enable or disable this setting directly from the Admin UI without any restarts. This is especially useful for cloud deployments where you don't have direct access to config files or want to avoid downtime. Enable store_model_in_db to move model definitions from your YAML into the database—reducing config complexity, improving scalability, and enabling dynamic model management across multiple proxy instances.

Eval results

We benchmarked our new built-in guardrails against labeled datasets before shipping. You can see the results for Denied Financial Advice (207 cases) and Denied Insults (299 cases):

100% precision means zero false positives — no legitimate messages were incorrectly blocked. 100% recall means zero false negatives — every message that should have been blocked was caught.

Compliance Playground

The Compliance Playground lets you test any guardrail against our pre-built eval datasets or your own custom datasets, so you can see precision, recall, and false positive rate before rolling it out to production.

Performance & Reliability — Up to 13% Lower Latency

This release cuts latency across all percentiles through 20+ micro-optimizations across logging, cost calculation, routing, and connection management. See benchmarking for more info about how to benchmark yourself.

Mean latency: 78.4 ms → 70.3 ms (−10.3%)

p50 latency: 64.8 ms → 57.3 ms (−11.7%)

p99 latency: 288.9 ms → 250.0 ms (−13.4%)

Streaming Connection Pool Fix

Fixed a 3-fold connection leak that caused TCP connection starvation under streaming workloads: the aiohttp transport wasn't closing connections, no finally blocks were calling close on disconnect, and a Uvicorn bug prevented disconnect signaling.

Redis Connection Pool Reliability

Fixed 4 separate connection pool bugs to make how we use Redis more reliable. The most important change was on pools being leaked on cache expiry and the other fixes are detailed here in PR #21717.

New Providers and Endpoints

New Providers (1 new provider):

IBM watsonx.ai — Rerank support for IBM watsonx.ai models

New LLM API Endpoints (1 new endpoint):

/v1/evals (POST/GET) — OpenAI-compatible Evals API for model evaluation

New Models / Updated Models

New Model Support (13 new models) including:

Anthropic claude-sonnet-4-6 with 200K context, reasoning, computer use, prompt caching, vision, PDF

Vertex AI vertex_ai/claude-opus-4-6@default with 1M context

Google Gemini gemini-3.1-pro-preview with audio, video, images, PDF

GitHub Copilot github_copilot/gpt-5.3-codex and github_copilot/claude-opus-4.6-fast

Mistral devstral-small-latest, devstral-latest, devstral-medium-latest

OpenRouter openrouter/minimax/minimax-m2.5

Fireworks AI models glm-4p7, minimax-m2p1, kimi-k2p5

Features

Includes day 0 support for Claude Sonnet 4.6, native structured outputs API support for AWS Bedrock, day 0 support for Google Gemini 3.1 pro preview, Databricks support, GitHub Copilot model additions, Mistral model aliases, IBM watsonx.ai rerank support, xAI usage fixes, Dashscope request formatting fixes, hosted_vllm multi-turn conversation improvements, OCI/Oracle Grok output pricing fix, AU Anthropic model ID fix, general routing and parameter improvements.

Bug Fixes

Fixes across AWS Bedrock, Bedrock Converse, Fireworks AI model pricing, Responses API reasoning parameter, metadata preservation for custom callbacks, spend logs cost calculation, logs pagination, UI logo caching, duplicate URL in tagsSpendLogsCall, key alias and team ID metadata preservation, response_model endpoint, internal user viewer access, warning suppression for litellm-dashboard team.

LLM API Endpoints

Features include Responses API improvements, Evals API OpenAI compatibility, Batch API file deletion criteria, Pass-Through Endpoints method-based routing, OAuth Authorization header forwarding, Websearch tool additions and fixes, general parameter and reasoning support.

Management Endpoints / UI

Features include Access Group Selector, Virtual Keys fixes, Key Last Active Tracking, Model Settings Modal, store_model_in_db database setting, input cost masking fixes, credentials resolution, team usage visibility, service account visibility, organization info UI improvements.

AI Integrations

Logging improvements with DataDog team tags, Prometheus metrics fixes and middleware, Langfuse test isolation, general logging cost fixes, streaming proxy throughput improvements.

Guardrails

Launch of Guardrail Garden marketplace, redesigned guardrail creation UI, guardrail jump links, guardrail tracing UI, AI Policy Templates with seven new ready-to-deploy policies including GDPR, EU AI Act, prompt injection detection, topic filters, airline off-topic restriction, SQL injection, AI-powered policy template suggestions.

Compliance Checker

Added compliance checker endpoints and UI panel, CSV dataset upload for batch testing.

Built-in Guardrails

Competitor name blocker, topic blocker, insults content filter, MCP Security guardrail.

Generic Guardrails

Configurable fallback for generic guardrail endpoint failures.

Presidio

Fixes to controls configuration.

LakeraAI

Avoid KeyError on missing LAKERA_API_KEY during initialization.

Auto Routing

Complexity-based auto routing scoring requests across 7 dimensions to route to appropriate model tier without embeddings or API calls.

Prompt Management

New API for prompt management integrations, prompt registry configuration fixes.

Spend Tracking, Budgets and Rate Limiting

Fixes for Bedrock service_tier cost propagation, cached response cost logging, aggregate daily activity endpoint performance, key alias and team ID metadata preservation, credential name tag injection.

MCP Gateway

OpenAPI-to-MCP conversion, MCP user permissions, MCP security guardrail, StreamableHTTPSessionManager fix, Bedrock AgentCore Accept header fix.

Performance / Loadbalancing / Reliability improvements

Logging and callback overhead optimizations, cost calculation optimizations, router and load balancing improvements, connection management and reliability fixes.

Database Changes

Added project_id column to LiteLLM_DeletedVerificationToken, new LiteLLM_ProjectTable for project management, last_active timestamp to LiteLLM_VerificationToken, vector store migration idempotency.

Security

Security scans with Grype and Trivy on Docker images, vulnerability report summary, critical and high severity vulnerabilities mostly in build-time dependencies, recommendations for best security posture.

Documentation Updates

Added OpenAI Agents SDK guide, Access Groups documentation, Anthropic beta headers docs, latency troubleshooting, rollback safety check, incident reports, stable mark for v1.81.12.

New Contributors

List of contributors making first contributions in this release.

Full Changelog

Link to full changelog from v1.81.12.rc.1 to v1.81.14.rc.1.
Original source
Jan 1, 2026
Date parsed from source:
Jan 1, 2026

First seen by Releasebot:
May 23, 2026
liteLLM

v1.81.12-stable.1 - Guardrail Policy Templates & Action Builder

liteLLM ships a broad release with new guardrail policy templates, a visual action builder, MCP OAuth2 and tracing, Responses API shell and context management support, Access Groups, and 50+ new Bedrock regional model entries, plus reliability fixes and UI improvements across the platform.
Key Highlights

Policy Templates - Pre-configured guardrail policy templates for common safety and compliance use-cases (including NSFW, toxic content, and child safety)

Guardrail Action Builder - Build and customize guardrail policy flows with the new action-builder UI and conditional execution support

MCP OAuth2 M2M + Tracing - Add machine-to-machine OAuth2 support for MCP servers and OpenTelemetry tracing for MCP calls through AI Gateway

Responses API shell Tool & context_management support - Server-side context management (compaction) and Shell tool support for the OpenAI Responses API

Access Groups - Create access groups to manage model, MCP server, and agent access across teams and keys

50+ New Bedrock Regional Model Entries - DeepSeek V3.2, MiniMax M2.1, Kimi K2.5, Qwen3 Coder Next, and NVIDIA Nemotron Nano across multiple regions

Add Semgrep & fix OOMs - Static analysis rules and out-of-memory fixes

PR #20912

Add Semgrep & fix OOMs

This release fixes out-of-memory (OOM) risks from unbounded asyncio.Queue() usage. Log queues (e.g. GCS bucket) and DB spend-update queues were previously unbounded and could grow without limit under load. They now use a configurable max size (LITELLM_ASYNCIO_QUEUE_MAXSIZE, default 1000); when full, queues flush immediately to make room instead of growing memory. A Semgrep rule (.semgrep/rules/python/unbounded-memory.yml) was added to flag similar unbounded-memory patterns in future code.

Guardrail Action Builder

This release adds a visual action builder for guardrail policies with conditional execution support. You can now chain guardrails into multi-step pipelines — if a simple guardrail fails, route to an advanced one instead of immediately blocking. Each step has configurable ON PASS and ON FAIL actions (Next Step, Block, or Allow), and you can test the full pipeline with a sample message before saving.

Access Groups

Access Groups simplify defining resource access across your organization. One group can grant access to models, MCP servers, and agents—simply attach it to a key or team. Create groups in the Admin UI, define which resources each group includes, then assign the group when creating keys or teams. Updates to a group apply automatically to all attached keys and teams.

New Providers and Endpoints

New Providers (2 new providers)

Provider | Supported LiteLLM Endpoints | Description
Scaleway | /chat/completions | Scaleway Generative APIs for chat completions
Sarvam AI | /chat/completions, /audio/transcriptions, /audio/speech | Sarvam AI STT and TTS support for Indian languages

New Models / Updated Models

New Model Support (19 highlighted models)

Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens)
AWS Bedrock | deepseek.v3.2 | 164K | $0.62 | $1.85
AWS Bedrock | minimax.minimax-m2.1 | 196K | $0.30 | $1.20
AWS Bedrock | moonshotai.kimi-k2.5 | 262K | $0.60 | $3.00
AWS Bedrock | moonshotai.kimi-k2-thinking | 262K | $0.73 | $3.03
AWS Bedrock | qwen.qwen3-coder-next | 262K | $0.50 | $1.20
AWS Bedrock | nvidia.nemotron-nano-3-30b | 262K | $0.06 | $0.24
Azure AI | azure_ai/kimi-k2.5 | 262K | $0.60 | $3.00
Vertex AI | vertex_ai/zai-org/glm-5-maas | 200K | $1.00 | $3.20
MiniMax | minimax/MiniMax-M2.5 | 1M | $0.30 | $1.20
MiniMax | minimax/MiniMax-M2.5-lightning | 1M | $0.30 | $2.40
Dashscope | dashscope/qwen3-max | 258K | Tiered pricing | Tiered pricing
Perplexity | perplexity/preset/pro-search | - | Per-request | Per-request
Perplexity | perplexity/openai/gpt-4o | - | Per-request | Per-request
Perplexity | perplexity/openai/gpt-5.2 | - | Per-request | Per-request
Vercel AI Gateway | vercel_ai_gateway/anthropic/claude-opus-4.6 | 200K | $5.00 | $25.00
Vercel AI Gateway | vercel_ai_gateway/anthropic/claude-sonnet-4 | 200K | $3.00 | $15.00
Vercel AI Gateway | vercel_ai_gateway/anthropic/claude-haiku-4.5 | 200K | $1.00 | $5.00
Sarvam AI | sarvam/sarvam-m | 8K | Free tier | Free tier
Anthropic | fast/claude-opus-4-6 | 1M | $30.00 | $150.00

Note: AWS Bedrock models are available across multiple regions (us-east-1, us-east-2, us-west-2, eu-central-1, eu-north-1, ap-northeast-1, ap-south-1, ap-southeast-3, sa-east-1). 54 regional model entries were added in total.

Features

Anthropic

Enable non-tool structured outputs on Claude Opus 4.5 and 4.6 using output_format param - PR #20548

Add support for anthropic_messages call type in prompt caching - PR #19233

Managing Anthropic Beta Headers with remote URL fetching - PR #20935, PR #21110

Remove x-anthropic-billing block - PR #20951

Use Authorization Bearer for OAuth tokens instead of x-api-key - PR #21039

Filter unsupported JSON schema constraints for structured outputs - PR #20813

New Claude Opus 4.6 features for /v1/messages - PR #20733

Fix reasoning_effort=None and "none" should return None for Opus 4.6 - PR #20800

AWS Bedrock

Extend model support with 4 new beta models - PR #21035

Add Claude Opus 4.6 to _supports_tool_search_on_bedrock - PR #21017

Correct Bedrock Claude Opus 4.6 model IDs (remove :0 suffix) - PR #20564, PR #20671

Add output_config as supported param - PR #20748

Vertex AI

Add Vertex GLM-5 model support - PR #21053

Propagate extra_headers anthropic-beta to request body - PR #20666

Preserve usageMetadata in _hidden_params - PR #20559

Map IMAGE_PROHIBITED_CONTENT to content_filter - PR #20524

Add RAG ingest for Vertex AI - PR #21120

OCI / Cohere

OCI Cohere responseFormat/Pydantic support - PR #20663

Fix OCI Cohere system messages by populating preambleOverride - PR #20958

Perplexity

Perplexity Research API support with preset search - PR #20860

MiniMax

Add MiniMax-M2.5 and MiniMax-M2.5-lightning models - PR #21054

Kimi / Moonshot

Add Kimi model pricing by region - PR #20855

Add moonshotai.kimi-k2.5 - PR #20863

Dashscope

Add dashscope/qwen3-max model with tiered pricing - PR #20919

Vercel AI Gateway

Add new Vercel AI Anthropic models - PR #20745

Azure AI

Add azure_ai/kimi-k2.5 to Azure model DB - PR #20896

Support Azure AD token auth for non-Claude azure_ai models - PR #20981

Fix Azure batches issues - PR #21092

DeepSeek

Sync DeepSeek model metadata and add bare-name fallback - PR #20938

Gemini

Handle image in assistant message for Gemini - PR #20845

Add missing tpm/rpm for Gemini models - PR #21175

General

Add 30 missing models to pricing JSON - PR #20797

Cleanup 39 deprecated OpenRouter models - PR #20786

Standardize endpoint display_name naming convention - PR #20791

Fix and stabilize model cost map formatting - PR #20895

Export PermissionDeniedError from litellm.init - PR #20960

Bug Fixes

Anthropic

Fix get_supported_anthropic_messages_params - PR #20752

Fix base_model name for body and deployment name in URL - PR #20747

Azure

Preserve content_policy_violation error details from Azure OpenAI - PR #20883

Vertex AI

Fix Gemini multi-turn tool calling message formatting (added and reverted) - PR #20569, PR #21051

LLM API Endpoints

Features
Responses API

Add server-side context management (compaction) support - PR #21058

Add Shell tool support for OpenAI Responses API - PR #21063

Preserve tool call argument deltas when streaming id is omitted - PR #20712

Preserve interleaved thinking/redacted_thinking blocks during streaming - PR #20702

Chat Completions

Add Web Search support using LiteLLM /search (web search interception hook) - PR #20483

Preserved nullable object fields by carrying schema properties - PR #19132

Support prompt_cache_key for OpenAI and Azure chat completions - PR #20989

Pass-Through Endpoints

Add support for langchain_aws via LiteLLM passthrough - PR #20843

Add custom_body parameter to endpoint_func in create_pass_through_route - PR #20849

Vector Stores

Add target_model_names for vector store endpoints - PR #21089

General

Add output_config as supported param - PR #20748

Add managed error file support - PR #20838

Bugs

General

Stop leaking Python tracebacks in streaming SSE error responses - PR #20850

Fix video list pagination cursors not encoded with provider metadata - PR #20710

Handle metadata=None in SDK path retry/error logic - PR #20873

Fix Spend logs pickle error with Pydantic models and redaction - PR #20685

Remove duplicate PerplexityResponsesConfig from LLM_CONFIG_NAMES - PR #21105

Fix Spend Management Tests - PR #21088

Fix JWT email domain validation error message - PR #21212

Management Endpoints / UI

Features
Access Groups

New Access Groups feature for managing model, MCP server, and agent access - PR #21022

Access Groups table and details page UI - PR #21165

Refactor model_ids to model_names for backwards compatibility - PR #21166

Policies

Allow connecting Policies to Tags, simulating Policies, viewing key/team counts - PR #20904

Guardrail pipeline support for conditional sequential execution - PR #21177

Pipeline flow builder UI for guardrail policies - PR #21188

SSO / Auth

New Login With SSO Button - PR #20908

M2M OAuth2 UI Flow - PR #20794

Allow Organization and Team Admins to call /invitation/new - PR #20987

Invite User: Email Integration Alert - PR #20790

Populate identity fields in proxy admin JWT early-return path - PR #21169

Spend Logs

Show predefined error codes in filter with user definable fallback - PR #20773

Paginated searchable model select - PR #20892

Sorting columns support - PR #21143

Allow sorting on /spend/logs/ui - PR #20991

UI Improvements

Navbar: Option to hide Usage Popup - PR #20910

Model Page: Improve Credentials Messaging - PR #21076

Fallbacks: Default configurable to 10 models - PR #21144

Fallback display with arrows and card structure - PR #20922

Team Info: Migrate to AntD Tabs + Table - PR #20785

AntD refactoring and 0 cost models fix - PR #20687

Zscaler AI Guard UI - PR #21077

Include Config Defined Pass Through Endpoints - PR #20898

Rename "HTTP" to "Streamable HTTP (Recommended)" in MCP server page - PR #21000

MCP server discovery UI - PR #21079

Virtual Keys

Allow Management keys to access user/daily/activity and team - PR #20124

Skip premium check for empty metadata fields on team/key update - PR #20598

Bugs

Logs: Fix Input and Output Copying - PR #20657

Teams: Fix Available Teams - PR #20682

Spend Logs: Reset Filters Resets Custom Date Range - PR #21149

Usage: Request Chart stack variant fix - PR #20894

Add Auto Router: Description Text Input Focus - PR #21004

Guardrail Edit: LiteLLM Content Filter Categories - PR #21002

Add null guard for models in API keys table - PR #20655

Show error details instead of 'Data Not Available' for failed requests - PR #20656

Fix Spend Management Tests - PR #21088

Fix JWT email domain validation error message - PR #21212

AI Integrations

Logging
PostHog

Fix JSON serialization error for non-serializable objects - PR #20668

Prometheus

Sanitize label values to prevent metric scrape failures - PR #20600

Langfuse

Prevent empty proxy request spans from being sent to Langfuse - PR #19935

OpenTelemetry

Auto-infer otlp_http exporter when endpoint is configured - PR #20438

CloudZero

Update CBF field mappings per LIT-1907 - PR #20906

General

Allow MAX_CALLBACKS override via env var - PR #20781

Add standard_logging_payload_excluded_fields config option - PR #20831

Enable verbose_logger when LITELLM_LOG=DEBUG - PR #20496

Guard against None litellm_metadata in batch logging path - PR #20832

Propagate model-level tags from config to SpendLogs - PR #20769

Guardrails

Policy Templates

New Policy Templates: pre-configured guardrail combinations for specific use-cases - PR #21025

Add NSFW policy template, toxic keywords in multiple languages, child safety content filter, JSON content viewer - PR #21205

Add toxic/abusive content filter guardrails - PR #20934

Pipeline Execution

Add guardrail pipeline support for conditional sequential execution - PR #21177

Agent Guardrails on streaming output - PR #21206

Pipeline flow builder UI - PR #21188

Zscaler AI Guard

Zscaler AI Guard bug fixes and support during post-call - PR #20801

Zscaler AI Guard UI - PR #21077

ZGuard

Add team policy mapping for ZGuard - PR #20608

General

Add logging to all unified guardrails + link to custom code guardrail templates - PR #20900

Forward request headers + litellm_version to generic guardrails - PR #20729

Empty guardrails / policies arrays should not trigger enterprise license check - PR #20567

Fix OpenAI moderation guardrails - PR #20718

Fix /v2/guardrails/list returning sensitive values - PR #20796

Fix guardrail status error - PR #20972

Reuse get_instance_fn in initialize_custom_guardrail - PR #20917

Spend Tracking, Budgets and Rate Limiting

Prevent shared backend model key from being polluted by per-deployment custom pricing - PR #20679

Avoid in-place mutation in SpendUpdateQueue aggregation - PR #20876

MCP Gateway (12 updates)

MCP M2M OAuth2 Support - Add support for machine-to-machine OAuth2 for MCP servers - PR #20788

MCP Server Discovery UI - Browse and discover available MCP servers from the UI - PR #21079

MCP Tracing - Add OpenTelemetry tracing for MCP calls running through AI Gateway - PR #21018

MCP OAuth2 Debug Headers - Client-side debug headers for OAuth2 troubleshooting - PR #21151

Fix MCP "Session not found" errors - Resolve session persistence issues - PR #21040

Fix MCP OAuth2 root endpoints returning "MCP server not found" - PR #20784

Fix MCP OAuth2 query param merging when authorization_url already contains params - PR #20968

Fix MCP SCOPES on Atlassian issue - PR #21150

Fix MCP StreamableHTTP backend - Use anyio.fail_after instead of asyncio.wait_for - PR #20891

Inject NPM_CONFIG_CACHE into STDIO MCP subprocess env - PR #21069

Block spaces and hyphens in MCP server names and aliases - PR #21074

Performance / Loadbalancing / Reliability improvements (8 improvements)

Remove orphan entries from queue - Fix memory leak in scheduler queue - PR #20866

Remove repeated provider parsing in budget limiter hot path - PR #21043

Use current retry exception for retry backoff instead of stale exception - PR #20725

Add Semgrep & fix OOMs - Static analysis rules and out-of-memory fixes - PR #20912

Add Pyroscope for continuous profiling and observability - PR #21167

Respect ssl_verify with shared aiohttp sessions - PR #20349

Fix shared health check serialization - PR #21119

Change model mismatch logs from WARNING to DEBUG - PR #20994

Database Changes

Schema Updates

Table | Change Type | Description | PR | Migration
LiteLLM_VerificationToken | New Indexes | Added indexes on user_id+team_id, team_id, and budget_reset_at+expires | PR #20736 | Migration
LiteLLM_PolicyAttachmentTable | New Column | Added tags text array for policy-to-tag connections | PR #21061 | Migration
LiteLLM_AccessGroupTable | New Table | Access groups for managing model, MCP server, and agent access | PR #21022 | Migration
LiteLLM_AccessGroupTable | Column Change | Renamed access_model_ids to access_model_names | PR #21166 | Migration
LiteLLM_ManagedVectorStoreTable | New Table | Managed vector store tracking with model mappings | - | Migration
LiteLLM_TeamTable, LiteLLM_VerificationToken | New Column | Added access_group_ids text array | PR #21022 | Migration
LiteLLM_GuardrailsTable | New Column | Added team_id text column | - | Migration

Documentation Updates (14 updates)

LiteLLM Observatory section added to v1.81.9 release notes - PR #20675

Callback registration optimization added to release notes - PR #20681

Middleware performance blog post - PR #20677

UI Team Soft Budget documentation - PR #20669

UI Contributing and Troubleshooting guide - PR #20674

Reorganize Admin UI subsection - PR #20676

SDK proxy authentication (OAuth2/JWT auto-refresh) - PR #20680

Forward client headers to LLM API documentation fix - PR #20768

Add docs guide for using policies - PR #20914

Add native thinking param examples for Claude Opus 4.6 - PR #20799

Fix Claude Code MCP tutorial - PR #21145

Add API base URLs for Dashscope (International and China/Beijing) - PR #21083

Fix DEFAULT_NUM_WORKERS_LITELLM_PROXY default (1, not 4) - PR #21127

Correct ElevenLabs support status in README - PR #20643

New Contributors

@iver56 made their first contribution in PR #20643

@eliasaronson made their first contribution in PR #20666

@NirantK made their first contribution in PR #19656

@looksgood made their first contribution in PR #20919

@kelvin-tran made their first contribution in PR #20548

@bluet made their first contribution in PR #20873

@itayov made their first contribution in PR #20729

@CSteigstra made their first contribution in PR #20960

@rahulrd25 made their first contribution in PR #20569

@muraliavarma made their first contribution in PR #20598

@joaokopernico made their first contribution in PR #21039

@datzscaler made their first contribution in PR #21077

@atapia27 made their first contribution in PR #20922

@fpagny made their first contribution in PR #21121

@aidankovacic-8451 made their first contribution in PR #21119

@luisgallego-aily made their first contribution in PR #19935

Full Changelog

v1.81.9.rc.1...v1.81.12.rc.1
Original source