liteLLM Release Notes

Follow

21 release notes curated from 22 sources by the Releasebot Team. Last updated: Jun 6, 2026

Get this feed:
  • Jun 4, 2026
    • Date parsed from source:
      Jun 4, 2026
    • First seen by Releasebot:
      Jun 6, 2026
    liteLLM logo

    liteLLM

    v1.88.0rc3 - Claude Opus 4.8, MCP Access-Group Authorization & Typed OpenTelemetry

    liteLLM releases v1.88.0rc3 with Claude Opus 4.8 support across major providers, a reworked MCP access-group system, typed OpenTelemetry spans, cheaper streaming, and new A2A discovery and LangGraph Platform mode.

    v1.88.0rc3 is the current release candidate for 1.88.0.

    New Models / Updated Models

    • Claude Opus 4.8 is supported across Anthropic, Bedrock (including global/us/eu/au regional routes), Azure AI, and Vertex, at 1M-token context with adaptive thinking and output_config goal mode.
    • MCP access-group authorization was reworked end to end: key and team access groups now resolve to MCP servers, grants are additive with opt-in member assignment, and clients can route through stateful or stateless sessions by session id.
    • Typed OpenTelemetry instrumentation lands a semconv-aligned span model that carries team_metadata, http.route, and model names on inference spans.
    • Streaming is ~30% cheaper per chunk on the Anthropic and Bedrock hot path.
    • Agent-to-agent (A2A) gains well-known agent-card discovery and a LangGraph Platform mode.

    New Model Support (Claude Opus 4.8 across 9 provider routes) includes Anthropic claude-opus-4-8, Vertex AI vertex_ai/claude-opus-4-8, Azure AI azure_ai/claude-opus-4-8, Bedrock anthropic.claude-opus-4-8 (+ global./us./eu./au. routes) with context windows up to 1,000,000 tokens (200,000 for Azure AI), input $5.00/1M tokens, output $25.00/1M tokens, and features like vision, function calling, prompt caching, reasoning (adaptive + max/xhigh effort), PDF input, computer use, response schema, tool choice, output_config, and native structured output for Bedrock.

    Additional updates include reasoning-effort flag cleanup across existing Claude catalog entries, removal of unsupported supports_minimal_reasoning_effort, normalization of supports_max_reasoning_effort, and a new bedrock_output_config_effort_ceiling (high/xhigh/max) field on Bedrock entries (PR #29238).

    Features

    • Anthropic: Add Claude Opus 4.8 and prune stale reasoning-effort flags (PR #29238).
    • Bedrock: Claude Code goal mode via output_config for Bedrock Opus (PR #28898), support tool search results and chat annotations (PR #29120).

    Bug Fixes

    • Anthropic: Stop injecting unsupported output_config.effort=xhigh for Claude Code on Sonnet/Opus 4.6 (PR #29304).
    • Vertex AI: Strip output_config.effort for Vertex Claude models that reject it (Haiku 4.5) (PR #29585).
    • Bedrock: Align toolUse/toolSpec names and allow hyphens (PR #28874).
    • Azure: Preserve AD token refresh in the v1 OpenAI client path (PR #28627).
    • OpenAI: Fix the double provider-prefix bug on model names (PR #28661).
    • General: Hydrate wildcard model-discovery credentials (PR #28284).

    LLM API Endpoints Features

    • Realtime API: Tool calling for the Gemini and Vertex AI live API (PR #26590).
    • A2A: Well-known agent-card discovery and LangGraph Platform mode (PR #28860).
    • Context Management: compact_20260112 polyfill so non-Anthropic providers get context compaction (PR #28868).
    • Video: Vertex Veo video edit, using DB credentials in the video handlers (PR #29098).
    • Pass-through: Extend passthrough_managed_object_ids to Azure (PR #29160).

    Bugs

    • Realtime API: Send TEXT frames and a valid guardrail session.update (PR #28848).
    • Moderations: Wire streaming flags through to the unified dispatcher (PR #27324).
    • Batches: Strip LiteLLM policy tracking from OpenAI batch metadata (PR #28425), map the stripped batch body.model back to the proxy alias for auth (PR #29264).
    • Vector Stores: Restrict vector store index create/delete to proxy admins (PR #29202).
    • Video: Resolve managed video model ids for auth (PR #29545).
    • Pass-through: Bedrock Knowledge Base pass-through: preserve SigV4 headers and the signed request body (PR #27526), enforce allowed_passthrough_routes for auth=true pass-through (PR #29256), de-duplicate pass-through endpoint logs (PR #29598), match pass-through registry routes bare-to-bare when SERVER_ROOT_PATH is set, fixing pass-through 404s (PR #29658).

    Management Endpoints / UI Features

    • Virtual Keys & Teams: Expose keys_count on /v2/team/list and wire the UI Resources badge (PR #28502), allow team members to create keys on org-scoped teams (PR #29310), exempt UI and CLI session tokens from team-key budget ceilings, hardened so custom default_key_generate_params cannot re-impose them (PR #29612, PR #29639), record ownership for service-account keys, plus a Prisma JSON serialization fix (PR #28990).
    • Deployment: Helm: split per-component ServiceAccounts for gateway, backend, and UI (PR #28712), Enterprise: RESEND_FROM_EMAIL for self-hosted Resend sends (PR #28830).

    Management Endpoints / UI Bugs

    • Virtual Keys & Teams: Refresh the team cache on team_model_add/team_model_delete (PR #28683), keep the team_alias cache in sync on _cache_team_object writes (PR #28737), fix spend-logs v2 route permissions (PR #28705), normalize the Bearer prefix in the safe-hash helper (PR #29343).
    • UI: Allow clearing custom pricing on wildcard models (PR #28719), stop vertex_ai-anthropic_models from leaking into the Anthropic dropdown (PR #28723), route API Reference back to the query-param page (PR #28726), show 2-decimal precision for max_budget on the key overview (PR #28809), break the logout redirect loop across dev and proxy origins (PR #29360), internal refactors: extract auth state into AuthContext, remove dead App Router scaffolding (PR #28910, PR #28891).

    AI Integrations Logging

    • DataDog: Drain the cost-management queue and add an opt-in FinOps tag allowlist (PR #28487).
    • Galileo: Support the hosted v2 spans API and string output extraction (PR #28771).
    • OpenTelemetry: Typed, semconv-aligned instrumentation (PR #28909), add team_metadata, http.route, and model names to inference spans (PR #29319), export the SERVER span on management-endpoint success without an http_request (PR #28794), link pass-through success spans to the SERVER root span (PR #29315).
    • General: Exclude proxy_server_request from its own body snapshot (PR #28618), fix duplicate Claude Code traces (PR #29311).

    Guardrails

    • General: Return HTTP 400 for LiteLLM content-filter blocks (PR #28418), wire apply_guardrail into proxy logging callbacks (PR #28970), persist disable_global_guardrails on keys (PR #29233).

    Spend Tracking, Budgets and Rate Limiting

    • Cost Tracking — OpenAI regional-processing cost uplift for EU/US data residency (PR #28626).
    • Rate Limiting — Cap the no-max_tokens TPM floor at the smallest configured limit (v3 limiter) (PR #28805).
    • Budgets — Enforce tag budgets for key-level tags (PR #29108), enforce deployment budgets for dynamically added models (PR #29273), reset_budget writes only {spend, budget_reset_at} and stops pre-zeroing the counter (PR #29358).

    MCP Gateway

    • Session Routing — Stateless and stateful clients via session-id routing (PR #26857).
    • Access Groups — Additive key access-group grants with opt-in member assignment (PR #29313), resolve team access_group_ids to MCP servers (PR #28997), resolve key access_group_ids to MCP servers (ungated) (PR #29195), extend the key access-group union to MCP servers (PR #28890).
    • Discovery — Allow llm_api_routes virtual keys to list MCP servers (PR #28442).
    • Server CRUD — Preserve source_url on GET /v1/mcp/server list responses (PR #29249), preserve omitted fields on PUT /v1/mcp/server partial updates (PR #29253).
    • Virtual Keys — Ignore stale ids on key save (PR #29128).

    Performance / Loadbalancing / Reliability improvements

    • Streaming hot path — ~30% lower per-chunk overhead on the Anthropic and Bedrock streaming path (PR #28720).
    • Docker — Use system Node in the componentized builders and retry apk add (PR #28888).
    • Dependencies — Routine dependency bumps, including a Starlette bad-host fix (PR #29208, PR #29373).

    Documentation Updates

    • Hand-written CLAUDE.md; remove AGENTS.md and point GEMINI.md at it (PR #29252).
    • Agent guidance: require consent before writing new third-party names (PR #28908).
    • Cookbook: bump the Go directive to 1.26.3 in the gollem example (PR #29234).

    General Proxy Improvements

    Testing, CI & build hardening:
    • UI e2e coverage across roles and flows — Team-BYOK add-model, Router fallback, MCP add-server, AI Hub make-public, Team Admin, Internal User / Viewer, logout and navbar identity (PR #29068, #29069, #29070, #29071, #29072, #29074, #29075, #29076, #29077, #29080, #29083, #28652).
    • Pass-through SERVER_ROOT_PATH login-redirect trailing-slash e2e (PR #29369).
    • Behavior-pinning harnesses for proxy_server.py (PR #28827, #29309).
    • Deterministic Redis cassette replay and live Google OAuth token minting for VCR (PR #28826, #29229).
    • Reasoning-effort grid test covering Claude Opus 4.8 across provider routes (PR #29327).
    • Bedrock CI account moves and restore (PR #28728, #29326, #29245).
    • Keep litellm_internal_staging green (PR #29344).
    • Regenerate the admin-ui static export with trailingSlash: true (PR #28112).

    PR roll-up by ownership area (total: 97):
    • Other (CI / tests / build hardening): 23
    • UI / Auth & Management: 18
    • LLM API Endpoints: 15
    • MCP: 9
    • Models & Providers: 9
    • Logging: 8
    • Spend / Budgets / Rate Limits: 5
    • Performance: 4
    • Documentation: 3
    • Guardrails: 3

    Release candidate changelog (rc.1 → rc.2 → rc.3)

    Almost everything above shipped in rc.1. The later candidates are small, targeted patches cut by cherry-pick.

    rc.2 added six fixes:
    • Resolve managed video model ids for auth (PR #29545).
    • Allow team members to create keys on org-scoped teams (PR #29310).
    • Strip output_config.effort for Vertex Claude Haiku 4.5 (PR #29585).
    • De-duplicate pass-through endpoint logs (PR #29598).
    • Exempt UI/CLI session tokens from team-key budget ceilings (PR #29612).
    • Harden that exemption against custom default_key_generate_params (PR #29639).
    rc.3 added one fix:
    • Match pass-through registry routes bare-to-bare when SERVER_ROOT_PATH is set, fixing pass-through 404s (PR #29658).

    New Contributors

    No new contributors this release; all 11 authors are returning contributors.

    Full Changelog: https://github.com/BerriAI/litellm/compare/v1.87.0-rc.1...v1.88.0-rc.3

    06/04/2026 (v1.88.0rc3)

    • New Models / Updated Models: 9
    • LLM API Endpoints: 15
    • Management Endpoints / UI: 18
    • AI Integrations (Logging / Guardrails): 11
    • Spend Tracking, Budgets and Rate Limiting: 5
    • MCP Gateway: 9
    • Performance / Loadbalancing / Reliability improvements: 4
    • General Proxy Improvements (testing / CI / build): 23
    • Documentation Updates: 3
    Total: 97 PRs

    Original source
  • May 23, 2026
    • Date parsed from source:
      May 23, 2026
    • First seen by Releasebot:
      Jun 2, 2026
    liteLLM logo

    liteLLM

    v1.87.0 - OCI Generative AI Provider, Gemini 3.5 Flash Day-0, MCP UI for OAuth Servers

    liteLLM adds OCI Generative AI as a first-class provider and expands Gemini day-0 support, while improving MCP OAuth tooling, Codex CLI auth, and Anthropic streaming performance. The release also brings new models, better logging and guardrails, and broader proxy reliability.

    Key Highlights

    • OCI Generative AI as a first-class provider — production-ready chat, embeddings, streaming, reasoning and tool use across Cohere Command-A, Meta Llama 3.1/3.2/3.3/4, xAI Grok 3/4, Google Gemini 2.5, and OpenAI GPT-5 hosted on OCI; full model-pricing catalog included.
    • Gemini 3.5 Flash Day-0 support — gemini-3.5-flash and gemini-3.1-flash-lite ship on Vertex AI, Google AI Studio, and OpenRouter with full pricing, function calling, web search, code execution, and managed-agents support.
    • MCP UI for OAuth tool calls — the dashboard now resolves tool list and tool call against OAuth-protected MCP servers directly, plus native MCP OAuth support for Cursor and clearer OAuth error messages.
    • Codex CLI auth hardening — JWT-derived team aliases and SSO form-URL flow for the OpenAI Codex CLI, plus allowlisted OIDC-claim persistence across the CLI SSO poll.
    • Anthropic streaming hot-path perf — ~90% lower TTFT overhead and higher sustained throughput on the proxy's Anthropic /v1/messages SSE path, measured on a real 4-pod deployment against both Anthropic and Bedrock Invoke (wire output is parity-tested); plus lazy-loaded response streaming for Bedrock SageMaker.

    New Providers and Endpoints

    New Providers (1 new provider)

    Provider: OCI Generative AI
    Supported LiteLLM Endpoints: /v1/chat/completions, /v1/embeddings
    Description: Official Oracle Cloud Infrastructure Generative AI integration. Production-ready support for chat, streaming, reasoning, tool calling, and embeddings across Cohere Command-A (incl. Reasoning + Vision), Meta Llama 3.1 / 3.2 / 3.3 / 4, xAI Grok 3 / 4, Google Gemini 2.5, and OpenAI GPT-5. Includes full model-pricing catalog. - PR #28223

    New Models / Updated Models

    New Model Support (22 new models) including Gemini gemini-3.5-flash and gemini-3.1-flash-lite with extensive features such as audio input, function calling, PDF input, vision, web search, and more.

    Features

    • Gemini
      • Day-0 support for gemini-3.5-flash - PR #28268
      • Add gemini-3.1-flash-lite model cost map - PR #28320
      • Additional gemini-3.1-flash-lite pricing entry - PR #27933
      • Gemini managed-agents support - PR #28270
    • Azure
      • Add Azure Speech STT config support - PR #27482
    • OpenRouter
      • Add Xiaomi MiMo-V2.5 and MiMo-V2.5-Pro model entries - PR #27700
      • Add openrouter/google/gemini-3.1-flash-lite pricing entry - PR #28280

    Bug Fixes

    • Vertex AI
      • Omit function_call.id on Vertex Gemini 3.5+ tool turns (the field is rejected by the new schema) - PR #28324
      • vertex_gemma: strip context_management from the request body - PR #28438
    • Bedrock
      • bedrock/cohere: send embedding_types as a JSON array, not a string - PR #28172
      • Sanitize batch metadata to prevent Pydantic ValidationError - PR #28202
      • Decouple STS region from Bedrock aws_region_name - PR #28245
    • SageMaker
      • Send the native Cohere embed payload to Cohere SageMaker endpoints - PR #28613
    • DeepSeek
      • Use the native /anthropic/v1/messages endpoint and sanitize tools - PR #28200
    • Azure
      • Decouple Azure OpenAI deployment ID from model name via base_model so GPT-5 model routing works on custom deployment names - PR #28490
      • Router: use the forwarded model_id for native Azure container IDs - PR #27921
    • vLLM
      • Fix Anthropic tool-call transformation on vLLM deployments - PR #28549

    LLM API Endpoints

    • Interactions API
      • Migrate to the Google Interactions API steps schema (May 2026 revision) - PR #28153
    • Google-native passthrough
      • Decode bytes and pass through SSE for Google-native streamGenerateContent (no more b'...' literals on the wire) - PR #28213
    • Responses API
      • Forward timeout on the completion-transformation path for Anthropic, Bedrock, and Vertex - PR #28133
      • Accept dict-shape reasoning_effort from the Anthropic Responses bridge - PR #28201
      • Wrap aresponses streaming iterator for mid-stream router fallbacks - PR #28215
      • Unblock staging — mypy + coverage for aresponses streaming fallback - PR #28318
      • Strip Anthropic cache_control from OpenAI Responses API requests - PR #28431
      • Use the OpenAI SSEDecoder for Responses API streaming - PR #28566
      • Replay openai/responses bridge cache hits as chat streams - PR #28158
    • Interactions API
      • Never drop streamed text deltas; always emit the terminal completion - PR #28394
    • Batch API
      • Normalize batch file IDs before the ManagedObjectTable write - PR #28339

    Management Endpoints / UI

    • Models + Endpoints
      • Add a pause/resume Switch on the models table - PR #28151
    • Spend Logs
      • Consolidate filter state and extract components in the UI - PR #25847
    • Playground
      • Interactions API endpoint in the Playground with SSE streaming - PR #28156
    • Passthrough Routes
      • Team passthrough routes — create parity + edit-load fix - PR #28098
      • Gate team.allowed_passthrough_routes writes to proxy admins - PR #28097
    • Auth / Codex CLI
      • Codex CLI JWT team alias propagation - PR #28621
      • Codex CLI SSO form-URL flow - PR #28271
      • Persist allowlisted OIDC claims in the CLI SSO poll - PR #28463
    • Virtual Keys
      • Encrypt callback_vars in key/team metadata at rest in the DB - PR #27141

    AI Integrations

    • Logging
      • Prometheus
        • Emit per-token-type detail metrics — five sparse counters that break out usage.prompt_tokens_details / usage.completion_tokens_details fields providers already report (LIT-3220) - PR #28372
        • Add user_email and user_alias labels to user budget metrics - PR #28155
      • OpenTelemetry
        • Propagate team_id and team_alias to all child OTEL spans - PR #28273
        • Emit a guardrail span on violations and surface status + categories - PR #28364
        • Serialize guardrail_response to JSON in OTEL traces - PR #28362
        • Stamp http.response.status_code on all error responses - PR #28405
    • Guardrails
      • Microsoft Purview DLP
        • New guardrail integration for Microsoft Purview DLP - PR #24966

    Spend Tracking, Budgets and Rate Limiting

    • Spend Counter — Seed the Redis counter via SET NX to prevent cross-pod double-seed on cold start - PR #27854
    • Cost Tracking — Recalculate cost after router retry failures so the logged cost reflects the actual attempt that succeeded - PR #28476
    • Cost Tracking — Treat litellm_provider=None as a wildcard in _check_provider_match so cost lookup works for catalog entries that omit the provider field - PR #28523

    MCP Gateway

    • OAuth in the UI — Add tool-call and tool-list support via the dashboard for OAuth-protected MCP servers - PR #28454
    • Cursor OAuth — Allow native MCP OAuth support for Cursor - PR #28327
    • Auth Resolution — JWT on tools/list and REST tools/call server resolution - PR #28227
    • Cold-Start Init — Forward upstream initialize instructions on cold gateway init - PR #28231
    • OAuth Errors — Add error_description and hint to OAuth flow error responses - PR #28471
    • Inspector — Trim whitespace from MCP inspector tool-call inputs - PR #28203

    Performance / Loadbalancing / Reliability improvements

    • Anthropic /v1/messages streaming hot path — cut per-request and per-chunk overhead on the proxy's Anthropic streaming path, with byte-identical wire output guaranteed by parity tests that diff the logged and billed payloads between the fast and legacy paths. Measured on a real 4-pod m7i.xlarge deployment (no HPA) streaming 256 text_delta chunks per request, against both Anthropic and Bedrock Invoke — TTFT overhead ~90% lower with higher sustained throughput (full numbers below) - PR #28289
      • Skip work that's a no-op in the default config: the per-chunk Datadog span when tracing is off, the per-chunk streaming hook when no callback / guardrail / cost-injection is active, and the agentic post-processing wrapper when no callback overrides its hook (it otherwise buffers every chunk and rebuilds the response from SSE just to call hooks that all return (False, {})).
      • Stop doing the same work twice per request: serialize the request body once and reuse it for the pre-call log and the wire, memoize the optional-params type-hint resolution (~80µs/request), and skip the redundant strip_empty_text_blocks scan when the async wrapper already sanitized.
      • Cheaper end-of-stream reconstruction: collapse the homogeneous run of content_block_delta text events into a single equivalent SSE event before stream_chunk_builder, removing O(output-token) ModelResponseStream constructions; tool-use / thinking / citations streams fall back to the unchanged legacy path.
      • Cheaper hot-path logging: gate debug f-string evaluation behind isEnabledFor(DEBUG), hoist cost_injection_active out of the per-chunk loop, and drop one async-generator layer per chunk in async_sse_data_generator.
    • Bedrock / SageMaker — Switch to lazy loading for response streaming - PR #28189
    • Granian ASGI — Add Granian as a supported ASGI server for better throughput stability - PR #26027
    • Prisma — Expose Prisma idle/connect timeout + extra DB URL params so production deployments can tune connection pools - PR #28395
    • Proxy auth — Strict media-type match for form bodies (defensive against ambiguous Content-Type) - PR #27939
    • Proxy auth — Carry the ASGI path into the WebSocket auth synthetic Request so auth resolves the right route - PR #27940
    • Docker — Restore npm to the non-root builder image so UI builds run there - PR #28519
    • Helm — Drop the main- prefix from the default image tag - PR #28710
    • License check — Read PEP 639 license-expression metadata in check_licenses - PR #28529

    Documentation Updates

    • Fix the incorrect /v1/agents request example - PR #28131
    • Fix misleading credential-passing examples in Gemini-agents GET/DELETE docstrings - PR #28293

    General Proxy Improvements

    Testing, CI & build hardening:

    • Behavior-pinning harness + Key Tier-1 matrix (and tier-2/3 + team management endpoints + phase-4 payload matrix) - PR #28321, PR #28441, PR #28620, PR #28681
    • Stabilize image-edit VCR cassettes to stop live gpt-image-1 spend - PR #28110
    • Migrate realtime + rerank tests off shut-down upstream models; replace gpt-4o-audio-preview with gpt-audio-1.5; expect session.created as xAI realtime initial event - PR #28191, PR #28281, PR #28424
    • Harden the flaky proxy callback-leak detector - PR #28195
    • E2E runner migrated to uv; add an "All Proxy Models" key test - PR #28313
    • UI-e2e: admin key creation with a specific proxy model; forward LITELLM_LICENSE to the UI e2e proxy - PR #28365, PR #28398
    • Vertex AI grounding test tolerates transient 500; streaming test tolerates Vertex 429 wrapped in MidStreamFallbackError - PR #28503, PR #28669
    • Bump black to 26.3.1 and reapply formatting; one-shot lint fix - PR #28525, PR #28639
    • Allow audio_transcription_config in the model-prices schema - PR #28708
    • Remove the dead old Playwright e2e suite - PR #28632
    • Routine dependency/CI bumps - PR #28287, PR #28524, PR #28528, PR #27665, PR #28296, PR #28303, PR #28707

    PR roll-up by ownership area

    PRs by ownership area (total: 93)

    • Other (CI / tests / build hardening): 25
    • Models & Providers (incl. new provider): 18
    • UI / Auth & Management: 12
    • LLM API Endpoints: 11
    • Performance: 9
    • Logging: 6
    • MCP: 6
    • Spend / Budgets / Rate Limits: 3
    • Docs: 2
    • Guardrails: 1

    New Contributors

    • @IshaMeera made their first contribution in #28131
    • @TorvaldUtne made their first contribution in #27700
    • @adityasingh2400 made their first contribution in #28523
    • @cwang-otto made their first contribution in #28133
    • @ro31337 made their first contribution in #28280
    • @withomasmicrosoft made their first contribution in #28490

    Full Changelog: https://github.com/BerriAI/litellm/compare/v1.86.0...v1.87.0

    Original source
  • All of your release notes in one feed

    Join Releasebot and get updates from liteLLM and hundreds of other software products.

    Create account
  • May 23, 2026
    • Date parsed from source:
      May 23, 2026
    • First seen by Releasebot:
      May 31, 2026
    liteLLM logo

    liteLLM

    v1.87.0rc1 - OCI Generative AI Provider, Gemini 3.5 Flash Day-0, MCP UI for OAuth Servers

    liteLLM adds OCI Generative AI as a first-class provider, ships day-0 Gemini 3.5 Flash support, and expands MCP OAuth, Codex CLI auth, and performance on Anthropic streaming. It also brings new guardrails, logging, budgets, and broader model and endpoint support.

    Key Highlights

    • OCI Generative AI as a first-class provider — production-ready chat, embeddings, streaming, reasoning and tool use across Cohere Command-A, Meta Llama 3.1/3.2/3.3/4, xAI Grok 3/4, Google Gemini 2.5, and OpenAI GPT-5 hosted on OCI; full model-pricing catalog included.
    • Gemini 3.5 Flash Day-0 support — gemini-3.5-flash and gemini-3.1-flash-lite ship on Vertex AI, Google AI Studio, and OpenRouter with full pricing, function calling, web search, code execution, and managed-agents support.
    • MCP UI for OAuth tool calls — the dashboard now resolves tool list and tool call against OAuth-protected MCP servers directly, plus native MCP OAuth support for Cursor and clearer OAuth error messages.
    • Codex CLI auth hardening — JWT-derived team aliases and SSO form-URL flow for the OpenAI Codex CLI, plus allowlisted OIDC-claim persistence across the CLI SSO poll.
    • Anthropic streaming hot-path perf — ~90% lower TTFT overhead and higher sustained throughput on the proxy's Anthropic /v1/messages SSE path, measured on a real 4-pod deployment against both Anthropic and Bedrock Invoke (wire output is parity-tested); plus lazy-loaded response streaming for Bedrock SageMaker.

    New Providers and Endpoints

    New Providers (1 new provider)

    Provider: OCI Generative AI
    Supported LiteLLM Endpoints: /v1/chat/completions, /v1/embeddings
    Description: Official Oracle Cloud Infrastructure Generative AI integration. Production-ready support for chat, streaming, reasoning, tool calling, and embeddings across Cohere Command-A (incl. Reasoning + Vision), Meta Llama 3.1 / 3.2 / 3.3 / 4, xAI Grok 3 / 4, Google Gemini 2.5, and OpenAI GPT-5. Includes full model-pricing catalog. - PR #28223

    Features

    • Gemini
      • Day-0 support for gemini-3.5-flash - PR #28268
      • Add gemini-3.1-flash-lite model cost map - PR #28320
      • Additional gemini-3.1-flash-lite pricing entry - PR #27933
      • Gemini managed-agents support - PR #28270
    • Azure
      • Add Azure Speech STT config support - PR #27482
    • OpenRouter
      • Add Xiaomi MiMo-V2.5 and MiMo-V2.5-Pro model entries - PR #27700
      • Add openrouter/google/gemini-3.1-flash-lite pricing entry - PR #28280

    Bug Fixes

    • Vertex AI
      • Omit function_call.id on Vertex Gemini 3.5+ tool turns (the field is rejected by the new schema) - PR #28324
      • vertex_gemma: strip context_management from the request body - PR #28438
    • Bedrock
      • bedrock/cohere: send embedding_types as a JSON array, not a string - PR #28172
      • Sanitize batch metadata to prevent Pydantic ValidationError - PR #28202
      • Decouple STS region from Bedrock aws_region_name - PR #28245
    • SageMaker
      • Send the native Cohere embed payload to Cohere SageMaker endpoints - PR #28613
    • DeepSeek
      • Use the native /anthropic/v1/messages endpoint and sanitize tools - PR #28200
    • Azure
      • Decouple Azure OpenAI deployment ID from model name via base_model so GPT-5 model routing works on custom deployment names - PR #28490
      • Router: use the forwarded model_id for native Azure container IDs - PR #27921
    • vLLM
      • Fix Anthropic tool-call transformation on vLLM deployments - PR #28549

    LLM API Endpoints

    • Interactions API
      • Migrate to the Google Interactions API steps schema (May 2026 revision) - PR #28153
    • Google-native passthrough
      • Decode bytes and pass through SSE for Google-native streamGenerateContent (no more b'...' literals on the wire) - PR #28213

    Bugs

    • Responses API
      • Forward timeout on the completion-transformation path for Anthropic, Bedrock, and Vertex - PR #28133
      • Accept dict-shape reasoning_effort from the Anthropic Responses bridge - PR #28201
      • Wrap aresponses streaming iterator for mid-stream router fallbacks - PR #28215
      • Unblock staging — mypy + coverage for aresponses streaming fallback - PR #28318
      • Strip Anthropic cache_control from OpenAI Responses API requests - PR #28431
      • Use the OpenAI SSEDecoder for Responses API streaming - PR #28566
      • Replay openai/responses bridge cache hits as chat streams - PR #28158
    • Interactions API
      • Never drop streamed text deltas; always emit the terminal completion - PR #28394
    • Batch API
      • Normalize batch file IDs before the ManagedObjectTable write - PR #28339

    Management Endpoints / UI

    Features

    • Models + Endpoints
      • Add a pause/resume Switch on the models table - PR #28151
    • Spend Logs
      • Consolidate filter state and extract components in the UI - PR #25847
    • Playground
      • Interactions API endpoint in the Playground with SSE streaming - PR #28156
    • Passthrough Routes
      • Team passthrough routes — create parity + edit-load fix - PR #28098
      • Gate team.allowed_passthrough_routes writes to proxy admins - PR #28097
    • Auth / Codex CLI
      • Codex CLI JWT team alias propagation - PR #28621
      • Codex CLI SSO form-URL flow - PR #28271
      • Persist allowlisted OIDC claims in the CLI SSO poll - PR #28463
    • Virtual Keys
      • Encrypt callback_vars in key/team metadata at rest in the DB - PR #27141

    Bugs

    • Auth / Discovery
      • Hydrate wildcard discovery credentials so OIDC discovery works against wildcarded providers - PR #28284
    • Spend Logs
      • Restore the log-filter loading indicator - PR #28282
    • End-User Logs
      • Fix end-user logs surfacing - PR #27758

    AI Integrations

    Logging

    • Prometheus
      • Emit per-token-type detail metrics — five sparse counters that break out usage.prompt_tokens_details / usage.completion_tokens_details fields providers already report (LIT-3220) - PR #28372
      • Add user_email and user_alias labels to user budget metrics - PR #28155
    • OpenTelemetry
      • Propagate team_id and team_alias to all child OTEL spans - PR #28273
      • Emit a guardrail span on violations and surface status + categories - PR #28364
      • Serialize guardrail_response to JSON in OTEL traces - PR #28362
      • Stamp http.response.status_code on all error responses - PR #28405

    Guardrails

    • Microsoft Purview DLP
      • New guardrail integration for Microsoft Purview DLP - PR #24966

    Spend Tracking, Budgets and Rate Limiting

    • Spend Counter — Seed the Redis counter via SET NX to prevent cross-pod double-seed on cold start - PR #27854
    • Cost Tracking — Recalculate cost after router retry failures so the logged cost reflects the actual attempt that succeeded - PR #28476
    • Cost Tracking — Treat litellm_provider=None as a wildcard in _check_provider_match so cost lookup works for catalog entries that omit the provider field - PR #28523

    MCP Gateway

    • OAuth in the UI — Add tool-call and tool-list support via the dashboard for OAuth-protected MCP servers - PR #28454
    • Cursor OAuth — Allow native MCP OAuth support for Cursor - PR #28327
    • Auth Resolution — JWT on tools/list and REST tools/call server resolution - PR #28227
    • Cold-Start Init — Forward upstream initialize instructions on cold gateway init - PR #28231
    • OAuth Errors — Add error_description and hint to OAuth flow error responses - PR #28471
    • Inspector — Trim whitespace from MCP inspector tool-call inputs - PR #28203

    Performance / Loadbalancing / Reliability improvements

    • Anthropic /v1/messages streaming hot path — cut per-request and per-chunk overhead on the proxy's Anthropic streaming path, with byte-identical wire output guaranteed by parity tests that diff the logged and billed payloads between the fast and legacy paths. Measured on a real 4-pod m7i.xlarge deployment (no HPA) streaming 256 text_delta chunks per request, against both Anthropic and Bedrock Invoke — TTFT overhead ~90% lower with higher sustained throughput (full numbers below) - PR #28289
    • Bedrock / SageMaker — Switch to lazy loading for response streaming - PR #28189
    • Granian ASGI — Add Granian as a supported ASGI server for better throughput stability - PR #26027
    • Prisma — Expose Prisma idle/connect timeout + extra DB URL params so production deployments can tune connection pools - PR #28395
    • Proxy auth — Strict media-type match for form bodies (defensive against ambiguous Content-Type) - PR #27939
    • Proxy auth — Carry the ASGI path into the WebSocket auth synthetic Request so auth resolves the right route - PR #27940
    • Docker — Restore npm to the non-root builder image so UI builds run there - PR #28519
    • Helm — Drop the main- prefix from the default image tag - PR #28710
    • License check — Read PEP 639 license-expression metadata in check_licenses - PR #28529

    Documentation Updates

    • Fix the incorrect /v1/agents request example - PR #28131
    • Fix misleading credential-passing examples in Gemini-agents GET/DELETE docstrings - PR #28293

    General Proxy Improvements

    Testing, CI & build hardening:

    • Behavior-pinning harness + Key Tier-1 matrix (and tier-2/3 + team management endpoints + phase-4 payload matrix) - PR #28321, PR #28441, PR #28620, PR #28681
    • Stabilize image-edit VCR cassettes to stop live gpt-image-1 spend - PR #28110
    • Migrate realtime + rerank tests off shut-down upstream models; replace gpt-4o-audio-preview with gpt-audio-1.5; expect session.created as xAI realtime initial event - PR #28191, PR #28281, PR #28424
    • Harden the flaky proxy callback-leak detector - PR #28195
    • E2E runner migrated to uv; add an "All Proxy Models" key test - PR #28313
    • UI-e2e: admin key creation with a specific proxy model; forward LITELLM_LICENSE to the UI e2e proxy - PR #28365, PR #28398
    • Vertex AI grounding test tolerates transient 500; streaming test tolerates Vertex 429 wrapped in MidStreamFallbackError - PR #28503, PR #28669
    • Bump black to 26.3.1 and reapply formatting; one-shot lint fix - PR #28525, PR #28639
    • Allow audio_transcription_config in the model-prices schema - PR #28708
    • Remove the dead old Playwright e2e suite - PR #28632
    • Routine dependency/CI bumps - PR #28287, PR #28524, PR #28528, PR #27665, PR #28296, PR #28303, PR #28707

    PR roll-up by ownership area

    PRs by ownership area (total: 93)

    • Other (CI / tests / build hardening): 25
    • Models & Providers (incl. new provider): 18
    • UI / Auth & Management: 12
    • LLM API Endpoints: 11
    • Performance: 9
    • Logging: 6
    • MCP: 6
    • Spend / Budgets / Rate Limits: 3
    • Docs: 2
    • Guardrails: 1

    New Contributors

    • @IshaMeera made their first contribution in #28131
    • @TorvaldUtne made their first contribution in #27700
    • @adityasingh2400 made their first contribution in #28523
    • @cwang-otto made their first contribution in #28133
    • @ro31337 made their first contribution in #28280
    • @withomasmicrosoft made their first contribution in #28490

    Full Changelog: https://github.com/BerriAI/litellm/compare/v1.86.0-rc.1...v1.87.0-rc.1

    Original source
  • May 16, 2026
    • Date parsed from source:
      May 16, 2026
    • First seen by Releasebot:
      May 31, 2026
    liteLLM logo

    liteLLM

    v1.86.0 - Weighted-Routing Failover, Native Web-Search Citations & OTel-Standard Tracing

    liteLLM ships weighted-routing failover, native Anthropic web-search citations, richer OpenTelemetry server spans, and a componentized gateway-backend-ui deployment. It also adds new Azure AI GPT-5.4 models and fixes a critical rate-limit regression.

    Key Highlights

    • Weighted-Routing Failover — on a deployment failure, the router now retries the same model group on a different deployment (e.g. another Azure region) while the initial pick still respects configured weights, behind a router-level flag.
    • Native web-search citations for Anthropic clients — LiteLLM now emits native web_search_tool_result blocks so Claude Desktop / Cowork render web-search citations correctly.
    • OTel-standard server-span attributes — the proxy SERVER span now carries http.response.status_code, http.route, url.path, and litellm.preprocessing.duration_ms, plus an opt-in for the experimental OTEL GenAI semantic conventions.
    • Componentized deployment — additive scaffold + Helm chart to split the monolithic proxy into independently scalable gateway, backend, and ui services.
    • Critical rate-limit regression fixed — the v3 limiter was leaking internal reservation keys into the upstream provider body, breaking every virtual key with a tpm_limit / rpm_limit set.

    Claude Code compatibility coverage

    We expanded the set of Claude Code features that LiteLLM automatically tests against daily, and added a Known Issues section to the Claude Code compatibility doc so customers can see which combinations are red, and why, before hitting them in production.

    This is a direct response to customer feedback on stability and regressions. The matrix is backed by a rigorous end-to-end suite that hits real provider endpoints with no mocking. The suite re-runs every day and the doc renders the latest LiteLLM stable against the latest Claude Code version.

    Today's coverage sits at 76% across Anthropic, Bedrock Invoke, Bedrock Converse, Vertex AI, and Azure Foundry. Over the next week we plan to bring this to 90%. Coming soon, the same suite will gate PRs: any cell flipping green to red will fail the check and block merges into staging, making it much harder for code that breaks Claude Code to land in the next release.

    New Models / Updated Models

    New Model Support

    Provider: Bedrock
    Model: jp.anthropic.claude-sonnet-4-6
    Context Window: 1,000,000
    Input ($/1M tokens): $3.30
    Output ($/1M tokens): $16.50
    Features: Prompt caching, reasoning, vision, function calling, PDF input, computer use

    Provider: Azure AI
    Model: azure_ai/gpt-5.4
    Context Window: 1,050,000
    Input ($/1M tokens): $2.50
    Output ($/1M tokens): $15.00
    Features: Reasoning, vision, web search, function calling, prompt caching, service tier

    Provider: Azure AI
    Model: azure_ai/gpt-5.4-pro
    Context Window: 1,050,000
    Input ($/1M tokens): $30.00
    Output ($/1M tokens): $180.00
    Features: Responses-mode, reasoning, vision, web search, prompt caching

    Provider: Azure AI
    Model: azure_ai/gpt-5.4-mini
    Context Window: 400,000
    Input ($/1M tokens): $0.75
    Output ($/1M tokens): $4.50
    Features: Reasoning, vision, web search, function calling, prompt caching

    Provider: Azure AI
    Model: azure_ai/gpt-5.4-nano
    Context Window: 400,000
    Input ($/1M tokens): $0.20
    Output ($/1M tokens): $1.25
    Features: Reasoning, vision, web search, function calling, prompt caching

    Each Azure AI GPT-5.4 model also ships a dated snapshot alias (gpt-5.4-2026-03-05, gpt-5.4-pro-2026-03-05, gpt-5.4-mini-2026-03-17, gpt-5.4-nano-2026-03-17) — 9 catalog entries total. All GPT-5.4 entries include tiered (>272k) and priority pricing.

    Features

    • Azure AI
      • Add Azure AI Foundry GPT-5.4 model metadata (gpt-5.4 / pro / mini / nano + dated aliases) - PR #28030
    • Bedrock
      • Add jp.cross-region inference profile for claude-sonnet-4-6 - PR #27976

    Bug Fixes

    • Bedrock
      • bedrock-mantle: use /anthropic/v1/messages path for Mantle (Claude Mythos Preview) endpoint — /v1/messages was 404ing every Mantle request - PR #27976

    LLM API Endpoints

    Features

    • Anthropic Messages API (/v1/messages)
      • Emit native web_search_tool_result blocks for Anthropic clients (Claude Desktop / Cowork citations) - PR #27886
    • Vector Stores
      • Fix vector store retrieve/list/update/delete when no completion model is set; merge URL query params into request data on those routes - PR #27929

    Bugs

    • Batch API
      • Managed batches: convert raw provider output_file_id to managed ID in the CheckBatchCost poller so GET /files/{id}/content resolves routing - PR #27984

    Management Endpoints / UI

    Bugs

    • Auth / OAuth
      • Allow allowlisted redirect URIs in OAuth setup - PR #27761
    • Config
      • Make /config/update env-var encryption idempotent (fixes double-encryption on repeated updates) + endpoint-level regression test - PR #28022
    • Models + Endpoints
      • Sort BYOK models by their displayed name in /v2/model/info - PR #28079

    AI Integrations

    Logging

    • OpenTelemetry
      • OTel-standard attributes on the proxy SERVER span: http.response.status_code, http.route, url.path, litellm.preprocessing.duration_ms - PR #28040
      • Set http.response.status_code on the success SERVER span (not just error spans) - PR #28090
      • Opt-in support for the experimental OTEL GenAI semantic conventions (OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental); default behavior unchanged - PR #27418

    Guardrails

    • Lasso
      • Add tool-calling support to LassoGuardrail (expands tool_calls / role=tool into Lasso tool_use / tool_result blocks; maps tool definitions) - PR #27648
    • CrowdStrike AIDR
      • Improve CrowdStrike AIDR input handling - PR #26658

    Secret Managers

    • General
      • Import get_secret at runtime to avoid an import-time ordering bug - PR #28014

    Spend Tracking, Budgets and Rate Limiting

    • Rate Limiting — Stop the v3 limiter from leaking internal reservation keys (_litellm_rate_limit_descriptors, litellm_tpm_reserved*) into the upstream provider body; this regression broke every virtual key with a tpm_limit / rpm_limit - PR #27913
    • Budgets — Tighten budget field validation and add missing authorization checks on user self-update / key-generation paths - PR #27897
    • Cost Tracking — Fix zero cost/usage on completed Vertex AI batch jobs (file content is now OpenAI-shaped post-#25627; old code read stale usageMetadata.*) - PR #27912

    MCP Gateway

    • Delegate-auth PKCE bypass for internal (available_on_public_internet: false) oauth2 interactive MCP servers — same anonymous PKCE path as public servers; client_credentials exclusion unchanged - PR #27977
    • Expose delegate_auth_to_upstream in the GET /v1/mcp/server list API (_build_mcp_server_table was dropping it, so the dashboard always showed false) - PR #27936

    Performance / Loadbalancing / Reliability improvements

    • Weighted-Routing Failover — on failure, retry the same model group on a different deployment while the initial pick respects configured weights; behind a router-level flag - PR #27980
    • Chat-completions fast path — cache callback capabilities once instead of re-scanning litellm.callbacks per request; skip streaming-iterator wrapping when no callback needs it - PR #27858
    • Componentized deployment — additive gateway/, backend/, ui/ Dockerfiles + Helm chart (per-component Deployment/Service/HPA, no edits to existing modules) - PR #27557
    • Terraform stacks — AWS ECS + GCP Cloud Run stacks for deploying the componentized gateway - PR #27673

    General Proxy Improvements

    Testing, CI & build hardening:

    • VCR cache observability: classify cache verdicts, detect live calls, surface cost leaks, aggregate xdist worker stats; Bedrock hostname / RFC1918 fixes - PR #27795
    • Reasoning-effort grid e2e regression suite (status classified by exception status_code); Fireworks / Gemini tests mocked instead of live - PR #28036
    • Modernize model references in CI tests and configs - PR #27856
    • Codecov: flag uploads, enable carryforward, close coverage gaps; --cov=./litellm path resolution - PR #28028, PR #27960
    • mutmut: enable mutate_only_covered_lines to fit CI budget - PR #27910
    • Remove unused GitHub Actions workflows and orphan files - PR #27957
    • Preserve global Button/Tooltip mocks in per-file vi.mock (UI tests) - PR #27958
    • Isolate run_server CLI tests from the Prisma DB-setup path - PR #28029
    • Validate response fields against the Interaction schema - PR #28037
    • De-flake test_gemini_image_size_limit_exceeded - PR #28039
    • Pin openai==2.33.0 in uv.lock - PR #28088

    New Contributors

    • @vladpolevoi made their first contribution in #27648

    Full Changelog: https://github.com/BerriAI/litellm/compare/v1.85.0...v1.86.0

    Summary 05/16/2026 (v1.86.0):

    • New Models / Updated Models: 2
    • LLM API Endpoints: 3
    • Management Endpoints / UI: 3
    • AI Integrations (Logging / Guardrails / Secret Managers): 6
    • Spend Tracking, Budgets and Rate Limiting: 3
    • MCP Gateway: 3
    • Performance / Loadbalancing / Reliability improvements: 4
    • General Proxy Improvements (testing / CI / build): 12
    • Documentation Updates: 0

    Total: 36 PRs

    Original source
  • Jan 1, 2025
    • Date parsed from source:
      Jan 1, 2025
    • First seen by Releasebot:
      May 31, 2026
    liteLLM logo

    liteLLM

    v1.80.5-stable - Gemini 3.0 Support

    liteLLM adds Prompt Studio, Gemini 3 support, and a new Model Compare UI, while also expanding MCP Hub, batch routing and spend tracking, AWS Secret Manager IAM auth, and realtime performance with major latency gains.

    Key Highlights

    • Gemini 3 - Day-0 support for Gemini 3 models with thought signatures
    • Prompt Management - Full prompt versioning support with UI for editing, testing, and version history
    • MCP Hub - Publish and discover MCP servers within your organization
    • Model Compare UI - Side-by-side model comparison interface for testing
    • Batch API Spend Tracking - Granular spend tracking with custom metadata for batch and file creation requests
    • AWS IAM Secret Manager - IAM role authentication support for AWS Secret Manager
    • Logging Callback Controls - Admin-level controls to prevent callers from disabling logging callbacks in compliance environments
    • Proxy CLI JWT Authentication - Enable developers to authenticate to LiteLLM AI Gateway using the Proxy CLI
    • Batch API Routing - Route batch operations to different provider accounts using model-specific credentials from your config.yaml

    Prompt Management

    This release introduces LiteLLM Prompt Studio - a comprehensive prompt management solution built directly into the LiteLLM UI. Create, test, and version your prompts without leaving your browser.

    You can now do the following on LiteLLM Prompt Studio:

    • Create & Test Prompts: Build prompts with developer messages (system instructions) and test them in real-time with an interactive chat interface
    • Dynamic Variables: Use {{variable_name}} syntax to create reusable prompt templates with automatic variable detection
    • Version Control: Automatic versioning for every prompt update with complete version history tracking and rollback capabilities
    • Prompt Studio: Edit prompts in a dedicated studio environment with live testing and preview

    API Integration:
    Use your prompts in any application with simple API calls:

    response = client.chat.completions.create(
      model = "gpt-4",
      extra_body = {
        "prompt_id": "your-prompt-id",
        "prompt_version": 2,
        # Optional: specify version
        "prompt_variables": {
          "name": "value"
        }
        # Optional: pass variables
      }
    )
    

    Performance – /realtime 182× Lower p99 Latency

    This update reduces /realtime latency by removing redundant encodings on the hot path, reusing shared SSL contexts, and caching formatting strings that were being regenerated twice per request despite rarely changing.

    Results

    Metric | Before | After | Improvement
    Median latency | 2,200 ms | 59 ms | −97% (37× faster)
    p95 latency | 8,500 ms | 67 ms | −99% (127× faster)
    p99 latency | 18,000 ms | 99 ms | −99% (182× faster)
    Average latency | 3,214 ms | 63 ms | −98% (51× faster)
    RPS | 165 | 1,207 | +631% (~7.3× increase)

    Test Setup

    Category | Specification
    Load Testing | Locust: 1,000 concurrent users, 500 ramp-up
    System | 4 vCPUs, 8 GB RAM, 4 workers, 4 instances
    Database | PostgreSQL (Redis unused)
    Configuration | config.yaml
    Load Script | no_cache_hits.py

    Model Compare UI

    New interactive playground UI enables side-by-side comparison of multiple LLM models, making it easy to evaluate and compare model responses.

    Features:

    • Compare responses from multiple models in real-time
    • Side-by-side view with synchronized scrolling
    • Support for all LiteLLM-supported models
    • Cost tracking per model
    • Response time comparison
    • Pre-configured prompts for quick and easy testing

    Details:

    • Parameterization: Configure API keys, endpoints, models, and model parameters, as well as interaction types (chat completions, embeddings, etc.)
    • Model Comparison: Compare up to 3 different models simultaneously with side-by-side response views
    • Comparison Metrics: View detailed comparison information including:
      • Time To First Token
      • Input / Output / Reasoning Tokens
      • Total Latency
      • Cost (if enabled in config)
    • Safety Filters: Configure and test guardrails (safety filters) directly in the playground interface

    New Providers and Endpoints

    New Providers

    • Docker Model Runner: /v1/chat/completions - Run LLM models in Docker containers

    New Models / Updated Models

    New Model Support

    Features

    Gemini (Google AI Studio + Vertex AI)

    • Add Day 0 gemini-3-pro-preview support
    • Add support for Gemini 3 Pro Image model
    • Add reasoning_content to streaming responses with tools enabled
    • Add includeThoughts=True for Gemini 3 reasoning_effort
    • Support thought signatures for Gemini 3 in responses API
    • Correct wrong system message handling for gemma
    • Gemini 3 Pro Image: capture image_tokens and support cost_per_output_image
    • Fix missing costs for gemini-2.5-flash-image
    • Gemini 3 thought signatures in tool call id

    Azure

    • Add azure gpt-5.1 models
    • Add Azure models 2025 11 to cost maps
    • Update Azure Pricing
    • Add SSML Support for Azure Text-to-Speech (AVA)

    OpenAI

    • Support GPT-5.1 reasoning.effort='none' in proxy
    • Add gpt-5.1-codex and gpt-5.1-codex-mini models to documentation
    • Inherit BaseVideoConfig to enable async content response for OpenAI video

    Anthropic

    • Add support for strict parameter in Anthropic tool schemas
    • Add image as url support to anthropic
    • Add thought signature support to v1/messages api
    • Anthropic - support Structured Outputs output_format for Claude 4.5 sonnet and Opus 4.1

    Bedrock

    • Haiku 4.5 correct Bedrock configs
    • Ensure consistent chunk IDs in Bedrock streaming responses
    • Add Claude 4.5 to US Gov Cloud
    • Fix images being dropped from tool results for bedrock

    Vertex AI

    • Add Vertex AI Image Edit Support
    • Update veo 3 pricing and add prod models
    • Fix Video download for veo3

    Snowflake

    • Snowflake provider support: added embeddings, PAT, account_id

    OCI

    • Add oci_endpoint_id Parameter for OCI Dedicated Endpoints

    XAI

    • Add support for Grok 4.1 Fast models

    Together AI

    • Add GLM 4.6 from together.ai

    Cerebras

    • Fix Cerebras GPT-OSS-120B model name

    Bug Fixes

    OpenAI

    • Fix for 16863 - openai conversion from responses to completions
    • Revert "Make all gpt-5 and reasoning models to responses by default"

    General

    • Get custom_llm_provider from query param
    • Fix optional param mapping
    • Add None check for litellm_params

    LLM API Endpoints

    Features

    Responses API

    • Add Responses API support for gpt-5.1-codex model
    • Add managed files support for responses API
    • Add extra_body support for response supported api params from chat completion

    Batch API

    • Support /delete for files + support /cancel for batches
    • Add config based routing support for batches and files
    • Populate spend_logs_metadata in batch and files endpoints

    Search APIs

    • Search APIs - error in firecrawl-search "Invalid request body"

    Vector Stores

    • Fix vector store create issue
    • Team vector-store permissions now respected for key access

    Audio Transcription

    • Fix audio transcription cost tracking
    • Add missing shared_sessions to audio/transcriptions

    Video Generation API

    • Fix videos tagging

    Bugs

    General

    • Responses API cost tracking with custom deployment names
    • Trim logged response strings in spend-logs

    SSO

    • Ensure role from SSO provider is used when a user is inserted onto LiteLLM
    • Docs - SSO - Manage User Roles via Azure App Roles

    Auth

    • Ensure Team Tags works when using JWT Auth
    • Fix key never expires

    Swagger UI

    • Fixes Swagger UI resolver errors for chat completion endpoints caused by Pydantic v2 $defs not being properly exposed in the OpenAPI schema

    AI Integrations

    Logging

    Arize Phoenix

    • Fix arize phoenix logging
    • Arize Phoenix - root span logging

    Langfuse

    • Filter secret fields form Langfuse

    General

    • Exclude litellm_credential_name from Sensitive Data Masker (Updated)
    • Allow admins to disable, dynamic callback controls

    Guardrails

    IBM Guardrails

    • Fix IBM Guardrails optional params, add extra_headers field

    Noma Guardrail

    • Use LiteLLM key alias as fallback Noma applicationId in NomaGuardrail
    • Allow custom violation message for tool-permission guardrail

    Grayswan Guardrail

    • Grayswan guardrail passthrough on flagged

    General Guardrails

    • Fix prompt injection not working

    Prompt Management

    • Allow specifying just prompt_id in a request to a model
    • Add support for versioning prompts
    • Allow storing prompt version in DB
    • Add UI for editing the prompts
    • Allow testing prompts with Chat UI
    • Allow viewing version history
    • Allow specifying prompt version in code
    • UI, allow seeing model, prompt id for Prompt
    • Show "get code" section for prompt management + minor polish of showing version history

    Secret Managers

    AWS Secrets Manager

    • Adds IAM role assumption support for AWS Secret Manager

    MCP Gateway

    • MCP Hub - Publish/discover MCP Servers within a company
    • MCP Resources - MCP resources support
    • MCP OAuth - Docs - mcp oauth flow details
    • MCP Lifecycle - Drop MCPClient.connect and use run_with_session lifecycle
    • MCP Server IDs - Add mcp server ids
    • MCP URL Format - Fix mcp url format

    Performance / Loadbalancing / Reliability improvements

    • Realtime Endpoint Performance - Fix bottlenecks degrading realtime endpoint performance
    • SSL Context Caching - Cache SSL contexts to prevent excessive memory allocation
    • Cache Optimization - Fix cache cooldown key generation
    • Router Cache - Fix routing for requests with same cacheable prefix but different user messages
    • Redis Event Loop - Fix redis event loop closed at first call
    • Dependency Management - Upgrade pydantic to version 2.11.0

    Documentation Updates

    Provider Documentation

    • Add missing details to benchmark comparison
    • Fix anthropic pass-through endpoint
    • Cleanup repo and improve AI docs

    API Documentation

    • Add docs related to openai metadata
    • Update docs with all supported endpoints and cost tracking

    General Documentation

    • Add mini-swe-agent to Projects built on LiteLLM

    Infrastructure / CI/CD

    UI Testing

    • Break e2e_ui_testing into build, unit, and e2e steps
    • Building UI for Testing
    • CI/CD Fixes

    Dependency Management

    • Bump js-yaml from 3.14.1 to 3.14.2 in /tests/proxy_admin_ui_tests/ui_unit_tests
    • Bump js-yaml from 3.14.1 to 3.14.2

    Migration

    • Migration job labels

    Config

    • This yaml actually works

    Release Notes

    • Add perf improvements on embeddings to release notes
    • Docs - v1.80.0

    Investigation

    • Investigate issue root cause

    New Contributors

    • @mattmorgis made their first contribution in PR #16371
    • @mmandic-coatue made their first contribution in PR #16732
    • @Bradley-Butcher made their first contribution in PR #16725
    • @BenjaminLevy made their first contribution in PR #16757
    • @CatBraaain made their first contribution in PR #16767
    • @tushar8408 made their first contribution in PR #16831
    • @nbsp1221 made their first contribution in PR #16845
    • @idola9 made their first contribution in PR #16832
    • @nkukard made their first contribution in PR #16864
    • @alhuang10 made their first contribution in PR #16852
    • @sebslight made their first contribution in PR #16838
    • @TsurumaruTsuyoshi made their first contribution in PR #16905
    • @cyberjunk made their first contribution in PR #16492
    • @colinlin-stripe made their first contribution in PR #16895
    • @sureshdsk made their first contribution in PR #16883
    • @eiliyaabedini made their first contribution in PR #16875
    • @justin-tahara made their first contribution in PR #16957
    • @wangsoft made their first contribution in PR #16913
    • @dsduenas made their first contribution in PR #16891

    Known Issues

    • /audit and /user/available_users routes return 404. Fixed in PR #17337

    Full Changelog

    View complete changelog on GitHub

    Original source
  • May 2026
    • No date parsed from source.
    • First seen by Releasebot:
      May 23, 2026
    liteLLM logo

    liteLLM

    v1.85.1 - Gemini 3.5 Flash & Reliability Fixes

    liteLLM ships a patch release with day-0 support for Gemini 3.5 Flash and reliability fixes for cross-pod spend accuracy and Vertex AI tool calling. The update broadens model support while improving budget tracking and request handling.

    v1.85.1 is a patch release on top of v1.85.0. It adds day-0 support for Gemini 3.5 Flash and ships two reliability fixes — cross-pod spend accuracy and Vertex AI tool calling.

    New Models / Updated Models

    New Model Support (1 new model)

    Provider Model Context Window Input ($/1M tokens) Output ($/1M tokens) Features Gemini / Vertex AI gemini/gemini-3.5-flash, vertex_ai/gemini-3.5-flash 1M $1.50 $9.00 Reasoning, vision, audio input, PDF input, prompt caching, web search, function calling, response schema

    Features

    • Gemini / Vertex AI
      • Day-0 support for Gemini 3.5 Flash on both Google AI Studio and Vertex AI - PR #28268

    Bug Fixes

    • Vertex AI
      • Omit the function_call / function_response id on Vertex Gemini 3.5+ tool turns, fixing HTTP 400 Unknown name "id" errors. Google AI Studio (gemini provider) still forwards the id on Gemini 3.5+ for strict tool-call matching - PR #28324

    Spend Tracking, Budgets and Rate Limiting

    • Seed the Redis spend counter via SET NX instead of INCRBYFLOAT to prevent cross-pod double-seeding. On multi-pod deployments this previously caused team spend to jump to ~Nx the pod count after a Redis cache miss / TTL expiry, triggering false "Budget Crossed" alerts - PR #27854

    Full Changelog

    https://github.com/BerriAI/litellm/compare/v1.85.0...v1.85.1

    Original source
  • May 2026
    • No date parsed from source.
    • First seen by Releasebot:
      May 23, 2026
    liteLLM logo

    liteLLM

    v1.84.1 - Gemini 3.5 Flash & Reliability Fixes

    liteLLM ships v1.84.1 with day-0 support for Gemini 3.5 Flash on Google AI Studio and Vertex AI, plus reliability fixes for cross-pod spend accuracy and Vertex AI tool calling.

    v1.84.1 is a patch release on top of v1.84.0. It adds day-0 support for Gemini 3.5 Flash and ships two reliability fixes — cross-pod spend accuracy and Vertex AI tool calling.

    New Model Support (1 new model)

    Provider: Gemini / Vertex AI
    Model: gemini/gemini-3.5-flash, vertex_ai/gemini-3.5-flash
    Context Window: 1M
    Input ($/1M tokens): $1.50
    Output ($/1M tokens): $9.00
    Features: Reasoning, vision, audio input, PDF input, prompt caching, web search, function calling, response schema

    Features

    • Gemini / Vertex AI
      • Day-0 support for Gemini 3.5 Flash on both Google AI Studio and Vertex AI - PR #28268

    Bug Fixes

    • Vertex AI
      • Omit the function_call/function_response id on Vertex Gemini 3.5+ tool turns, fixing HTTP 400 Unknown name "id" errors. Google AI Studio (gemini provider) still forwards the id on Gemini 3.5+ for strict tool-call matching - PR #28324

    Spend Tracking, Budgets and Rate Limiting

    • Seed the Redis spend counter via SET NX instead of INCRBYFLOAT to prevent cross-pod double-seeding. On multi-pod deployments this previously caused team spend to jump to ~Nx the pod count after a Redis cache miss / TTL expiry, triggering false "Budget Crossed" alerts - PR #27854

    Full Changelog

    https://github.com/BerriAI/litellm/compare/v1.84.0...v1.84.1

    Original source
  • May 16, 2026
    • Date parsed from source:
      May 16, 2026
    • First seen by Releasebot:
      May 23, 2026
    liteLLM logo

    liteLLM

    v1.85.0 - Realtime GA, MCP Gateway Expansion & Hardened Multi-Tenancy

    liteLLM ships a major release with OpenAI Realtime GA support, stronger multi-tenant isolation, expanded MCP Gateway permissions and auth, a broad observability overhaul, and new model support across OpenAI, xAI, OpenRouter, SambaNova, and Bedrock.

    Key Highlights

    • OpenAI Realtime GA — first-class support for the GA OpenAI Realtime API (plus beta compatibility), including gpt-realtime-2 pricing and /openai/v1/realtime logging.
    • Hardened multi-tenancy — a large sweep of per-tenant scoping fixes across keys, projects, batches, files, MCP servers, and analytics endpoints (project-hijack/key-org isolation, service-account resource isolation, per-entity team/agent activity scoping).
    • MCP Gateway expansion — org-level MCP server/toolset permissions, OBO (on-behalf-of) MCP auth, delegate_auth_to_upstream PKCE passthrough, and MCP access-group name namespacing.
    • Observability overhaul — broad Prometheus fixes (label-count correctness, end-user cardinality cap, PromQL escaping), OTEL handler isolation + GenAI message-content capture, and decoupled S3 audit-log config.
    • New models — xAI grok-4.3 / grok-4.3-latest, OpenAI gpt-realtime-2, OpenRouter qwen/qwen3.6-plus, SambaNova MiniMax-M2.7, and Bedrock Z.AI GLM-5.

    New Models / Updated Models

    New Model Support (5 new models)

    Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features
    OpenAI | gpt-realtime-2 | 32K | $4.00 (audio in $32.00) | $16.00 (audio out $64.00) | Realtime (/v1/realtime), audio in/out, function calling, parallel tool calls
    xAI | xai/grok-4.3 | 1M | $1.25 (>200K: $2.50) | $2.50 (>200K: $5.00) | Reasoning, vision, prompt caching, response schema, web search, tool calling
    xAI | xai/grok-4.3-latest | 1M | $1.25 (>200K: $2.50) | $2.50 (>200K: $5.00) | Reasoning, vision, prompt caching, response schema, web search, tool calling
    OpenRouter | openrouter/qwen/qwen3.6-plus | 1M | $0.325 | $1.95 | Reasoning, vision, function calling, tool choice
    SambaNova | sambanova/MiniMax-M2.7 | 204.8K | $0.30 | $1.20 | Reasoning, function calling, tool choice

    Pricing/metadata also updated for existing entries: Gemini multimodal-embedding pricing repointed to the Vertex pricing source with image/audio/video per-unit costs, audio-token cost reductions on realtime/Gemini entries, and a gemini-embedding-2-preview cost alignment.

    Features

    • Anthropic
      • Forward output_config.effort, reject garbage reasoning_effort with 400, and omit thinking/output_config when reasoning_effort="none" - PR #27074, PR #27039
      • Add Bedrock Claude Platform route - PR #27678
      • Inject dummy tool without modify_params - PR #27620
    • Bedrock
      • Add Z.AI GLM-5 model support - PR #24338
      • Handle document content blocks in Converse API message conversion - PR #24644
      • Refactor response stream shape handling - PR #27257
    • Vertex AI
      • Model Garden OpenAPI support for publisher model IDs - PR #26076
      • Omit system_instruction / tools / toolConfig when cachedContent set - PR #26077
    • Gemini
      • Follow provider defaults for Gemini 3 thinking - PR #25764
      • Handle Gemini Files API URIs without fetching - PR #24922
      • Normalize response_schema on native generateContent - PR #27775
    • xAI
      • Add parallel_tool_calls to supported params - PR #25106
    • Azure
      • Authenticate to Azure with a token - PR #27556
      • Azure Sentinel audit-log support - PR #27280
    • General
      • gpt-5.5 reasoning-effort capability flags + supports_low_reasoning_effort - PR #26456
      • Match litellm.completion supported params with proxy model info - PR #27720

    Bug Fixes

    • OpenRouter
      • Strip openrouter/ prefix from model names - PR #24282
    • Azure
      • Forward api_version to aembedding() for Azure AI Foundry v1 endpoints - PR #24911
      • Route Azure container file requests by decoded deployment - PR #26402
    • Anthropic / Vertex
      • Fix Vertex Anthropic streaming status-error hangs - PR #27310
      • Fix Anthropic streaming reasoning token usage - PR #27319
    • Fireworks AI
      • Strip thinking_blocks from chat messages before the Fireworks API call - PR #27881
    • hosted vLLM
      • Normalize custom tools for chat completions - PR #25763
    • General
      • Decode unified file_id when model_file_id_mapping is unavailable - PR #27406
      • Pass output_config through to backends that accept it - PR #26439
      • Resolve provider from deployment for multi-provider default config - PR #27517
      • Return 503 from /health when the targeted model is unhealthy or DB is disconnected - PR #27003
      • Guard URL-valued model destinations and align resource-model auth checks - PR #26915, PR #26963

    LLM API Endpoints

    Features

    • Realtime API
      • OpenAI Realtime GA support and beta compatibility - PR #27110
      • Add /openai/v1/realtime to routes for logging - PR #27323
    • Responses API
      • Persist and replay streamed Responses API requests from cache - PR #24580
      • Route gpt-5.4+ chat-without-tools to the Responses API - PR #27618
      • Preserve cache_control in Responses → Chat Completion transformation - PR #27727
      • Normalize chat tool_choice for the completions→responses bridge - PR #27634
    • Batches
      • Bedrock batch model-invocation job retrieval - PR #26834
      • Transform Vertex AI batch prediction outputs to OpenAI format - PR #25627
      • Set response=null on batch error entries per OpenAI spec - PR #27041
    • Embeddings
      • Default OpenAI-path encoding_format to float - PR #26976
      • Separate embeddings for multimodal inputs + combined multimodal embeddings via nested input - PR #24337, PR #24341
    • Audio Transcription
      • Add NVIDIA Riva STT provider - PR #27185
    • Vector Stores
      • Resolve embedding config at request time, never persist credentials - PR #27082
      • Tighten managed-store access - PR #26930

    Bugs

    • General
      • Preserve compact_20260112 context management on Bedrock /v1/messages - PR #27534
      • Fix managed file model_mappings when the router resolves a single deployment dict - PR #26950
      • Omit model from Azure deployment image-gen / image-edit bodies - PR #27103
      • Fix Bedrock passthrough call-ID headers - PR #27412
      • Pin Responses API affinity to the Azure resource on model-group switch - PR #27703
      • Align vertex_ai/gemini-embedding-2-preview cost with Vertex multimodal pricing - PR #27848
      • Consolidate batch + dynamic limiter check/increment - PR #26954
    • Authorization hardening
      • Block missing write routes for proxy admin viewers; restore admin-viewer read parity on Logs + Settings - PR #27007, PR #26846
      • Encode upstream URL path identifiers; require a trusted proxy for header-identity auth - PR #26860, PR #26825
      • Bind generic SSO state to a session cookie; allow non-admin compliance-path reads - PR #26944, PR #27234
    • Keys / Teams / SCIM
      • Honor key access_group_ids when a team restricts models; resolve access-group names in team filtering and same-name deployment routing - PR #26275, PR #25224, PR #26161
      • Revoke virtual keys when SCIM deprovisions a user; fix SCIM user-lookup filters - PR #26861, PR #27308
      • Key-rotation bug fix; honor team_member_permissions on /key/list - PR #27756, PR #27026
      • /config/update targeted per-section writes (drop store_model_in_db gate) - PR #26643
      • Scope CLI stored token to base_url; redact Gemini API key from URL query params in error traces - PR #26945, PR #24943
    • UI fixes
      • Remove the insecure ?token= URL handler from the login page; clear admin session cookies before establishing an invited user's session; URL-encode team_id in teamInfoCall - PR #26924, PR #27227, PR #27466
      • Project dropdown empty for internal users (3 bugs); remove blank leading entry from access-group model dropdown; omit allowed_routes from key edit save when unchanged - PR #26664, PR #27521, PR #27553
      • Member/team access-group fix; team model test-connection authorization - PR #27317, PR #27487

    AI Integrations

    Logging

    • Prometheus
      • Fix custom-metadata label counts, cap end-user metric cardinality, fix remaining-metric zero values, escape api_key for PromQL string literals, emit litellm_remaining_tokens_metric for Bedrock & Vertex - PR #27268, PR #27272, PR #27348, PR #27013, PR #27705
      • Fix /metrics hang when require_auth_for_metrics_endpoint is true and auth succeeds; point /metrics 401 at the opt-out flag; fix metric labels for litellm-side rejects - PR #25980, PR #27502, PR #26947
    • OpenTelemetry
      • Isolate dual OTEL handlers; honor OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT; fix proxy-integration tracing bugs - PR #27018, PR #27403, PR #27757
    • Arize / LangSmith
      • Arize _set_usage_outputs handles raw OpenAI Pydantic CompletionUsage; remove unwanted metadata info from LangSmith - PR #26506, PR #26894
    • General
      • Decouple S3 audit-log config via s3_audit_callback_params - PR #27222
      • Set verbose_logger level when LITELLM_LOG=INFO; require a team-management role on /team/{id}/callback; close callback-config and observability-credential side channels; guard dynamic integration hosts - PR #26401, PR #26819, PR #27081, PR #26921

    Guardrails

    • General
      • Add Qohash Nexus guardrail hook - PR #24927
      • Run model-level post_call guardrails on streaming requests; ensure post-call guardrail fires exactly once - PR #26922, PR #27012, PR #26109
      • Preserve Responses event streams in Presidio output masking - PR #26878
      • Cover multimodal + Responses-API content shapes; tighten tool-permission checks; optional skip of tool message in unified guardrail inputs - PR #26957, PR #26969, PR #27441
      • Handle legacy dict shape for metadata.guardrails in the Team UI - PR #27224

    Prompt Management

    • General
      • Block path-traversal in BitBucket / Arize Phoenix / AssemblyAI clients; sandbox jinja2 in the GitLab/Arize/BitBucket prompt managers - PR #26943, PR #27043

    Secret Managers

    • General
      • Audit-log /cache/settings and /config_overrides/hashicorp_vault mutations - PR #26953

    Spend Tracking, Budgets and Rate Limiting

    Rate Limiting

    • Atomic TPM rate limit; include model name + configured TPM/RPM in priority rate-limit 429 errors - PR #27001, PR #27216
    • Load team-member RPM/TPM from membership budget in the combined view - PR #24925

    Budgets

    • Skip the personal-budget hook when a reservation covers the counter - PR #27021
    • Treat 0 team_member_budget as no cap; enforce team-member budget without a user row; reset org/tag/proxy budgets correctly - PR #27133, PR #27273, PR #27326, PR #27488
    • Flush virtual-key model_max budget spend to Redis after success logging; tighten budget spend admission - PR #27334, PR #26845

    Tag Budgets & Routing

    • Enforce tag budgets on x-litellm-tags header requests; tag-budget reset drops stale management-cache entries; union x-litellm-tags with static team/key tags; fix internal tag-usage scoping; always merge caller-supplied tags into request metadata - PR #27573, PR #27568, PR #27247, PR #27315, PR #27784
    • Tag-routing test preventing header-regex bypass for strict plain-text tags - PR #26805

    Spend Logs / Cost

    • Pass service_tier through Azure and Azure AI cost calculation - PR #24926
    • Opt-in suppression of stack traces in spend-tracking error logs; keep spend-log cleanup running after batch failures; redact echoed prompts in error_information; prevent secret_fields from leaking into spend logs; drop client-supplied pricing fields from request bodies - PR #26899, PR #27303, PR #27689, PR #27143, PR #27071

    MCP Gateway

    Features

    • Org-level MCP server and toolset permissions - PR #26960
    • OBO (on-behalf-of) MCP auth - PR #27421
    • delegate_auth_to_upstream flag for PKCE passthrough - PR #27834
    • Support MCP access-group names in URL-based namespacing - PR #27726

    Bugs

    • Sanitize tool names to Anthropic's [a-zA-Z0-9_-]{1,128} pattern - PR #26788
    • Require a trusted-proxy gate before honoring X-Forwarded-* on OAuth discovery; preserve oauth2 m2m auth for tools routes; run pre_call_tool_check on the OpenAPI/local-registry path - PR #26841, PR #26871, PR #27016
    • Redact MCP server URL/headers for non-admin viewers; replace user-API-key auth with authorization-or-cookie for MCP server creation - PR #27027, PR #27190
    • Fix MCP DB reload partial failures; surface upstream 401 for token-forwarding MCP servers - PR #27314, PR #27847

    Performance / Loadbalancing / Reliability improvements

    Routing & Reliability

    • Trigger fallbacks on mid-stream httpx.TimeoutException - PR #26998
    • Register cooldowns on failure + fail fast on stale encrypted_content (Responses) - PR #27820
    • Register model info under the responses/-stripped variant - PR #27531
    • Fix Redis Sentinel client handling for authenticated Sentinel setups - PR #26302

    Proxy hot path

    • Token-verification query optimization - PR #26202
    • Run daily activity aggregation off the event loop - PR #27264
    • Shared IAM cache + static credentials in BaseAWSLLM - PR #27125
    • Isolate semantic cache entries; stable Redis key generation across working directories; remove a duplicate in-memory cache-size constant - PR #26990, PR #27025, PR #26385
    • Early proxy request-size enforcement; coerce non-str x-litellm-* header values to avoid an httpx TypeError - PR #27311, PR #27504
    • Separate DB read and write endpoints - PR #27493

    Health checks

    • Shared health-check polling; health_check_reasoning_effort for model health checks; skip disable_background_health_check models on GET /health; scope /health response to the caller's models; remove the separate health app - PR #26434, PR #27115, PR #27716, PR #26935, PR #27430

    Config / startup robustness

    • Hot-reload config YAML when --reload is set; break the managed-resources import cycle on Python 3.13; reject bare-str file-input sinks (local-file read hardening) - PR #27274, PR #27160, PR #27762

    Packaging / Docker / Helm / CI

    • Pin Wolfi & uv to multi-arch index digests; remove the hardcoded Prisma binary target for multi-arch builds; clear flagged OS-package advisories on the Docker image; refresh dependency locks - PR #27123, PR #27170, PR #27225, PR #27126
    • Helm: skip startup prisma db push when a migrations Job is enabled; increase default probe timeouts, disable debug logging by default - PR #27200, PR #27237
    • CI: Rerun Failed Tests for all pytest jobs, block PRs that drop coverage, Redis-backed VCR replay caches, reduce cassette bloat, mutation-testing workflow, dev-tag detection in the release workflow, Playwright apt-install skip - PR #27155, PR #27340, PR #26838, PR #27159, PR #27409, PR #27576, PR #26966, PR #27169
    • Remove legacy deployment artifacts and litellm-js packages; remove a redundant backup pricing file; misc test/import cleanup - PR #27541, PR #16590, PR #27699, PR #27633
    • Tighten router-settings-override and mock-testing trust; drop blank-text fallback for empty Bedrock Converse thinking blocks - PR #26968, PR #27850

    Documentation Updates

    • Update the Greptile README logo to a higher-quality image - PR #25385
    • Add a BudgetManager.reset_cost docstring - PR #27867
    • Add a _LoopWrapper class docstring - PR #27870

    New Contributors

    • @kimimgo made their first contribution in #24282
    • @shubham-arora-clear made their first contribution in #24644
    • @ohnoah made their first contribution in #24580
    • @ushiromiya-lion made their first contribution in #25106
    • @gowtham2809 made their first contribution in #25224
    • @he-yufeng made their first contribution in #26401
    • @MackDing made their first contribution in #26419
    • @dgu1-godaddy made their first contribution in #26834
    • @Vedanshu7 made their first contribution in #24943
    • @dennishenry made their first contribution in #27190
    • @SHARP155 made their first contribution in #27466
    • @mats852 made their first contribution in #24927

    Full Changelog: https://github.com/BerriAI/litellm/compare/v1.84.0...v1.85.0

    Counts cover PRs new in v1.85.0 relative to v1.84.0 stable. 14 PRs that were backported into v1.84.0 stable (and documented in the v1.84.0 release notes) are excluded here to avoid double-counting.

    Summary of PR counts

    • New Models / Updated Models: 43
    • LLM API Endpoints: 24
    • Management Endpoints / UI: 54
    • AI Integrations (Logging / Guardrails / Prompt Mgmt / Secret Managers): 32
    • Spend Tracking, Budgets and Rate Limiting: 23
    • MCP Gateway: 12
    • Performance / Loadbalancing / Reliability improvements: 41
    • Documentation Updates: 3

    Total: 232 PRs

    Original source
  • May 14, 2026
    • Date parsed from source:
      May 14, 2026
    • First seen by Releasebot:
      May 23, 2026
    liteLLM logo

    liteLLM

    v1.84.0 - Reliability hardening + multi-pod budget accuracy

    liteLLM ships v1.84.0 with a PEP 440 versioning change and a broad hardening release. It adds routing groups, MCP OAuth and credential improvements, better budget enforcement, lower Docker memory use, Prisma reconnect fixes, and several new models and providers.

    Version naming change

    Starting with v1.84.0, LiteLLM versions follow PEP 440. Stable releases drop the -stable suffix — the Docker tag for this release is litellm:1.84.0, not litellm:1.84.0-stable. Every Docker tag is published in both bare and v-prefixed form (litellm:1.84.0 and litellm:v1.84.0 resolve to the same image), so existing pins that include the v prefix keep working. PyPI versions remain the bare PEP 440 form: pip install litellm==1.84.0. If you pin LiteLLM in deployment tooling (Helm values, requirements.txt, Renovate rules, etc.), update those pins to the PEP 440 form.

    Mapping from the legacy suffix scheme to the new PEP 440 scheme:

    Channel | Legacy (≤ v1.83.x) | New (≥ v1.84.0)
    Stable | vX.Y.Z-stable | vX.Y.Z
    Stable patch | vX.Y.Z-stable.patch.N | vX.Y.Z.postN
    Release candidate | vX.Y.Z.rc.N / vX.Y.Z-rc.N | vX.Y.ZrcN
    Dev / nightly | vX.Y.Z-nightly / vX.Y.Z.dev.N | vX.Y.Z.devN

    This is a naming change only — release cadence, stability guarantees, and image contents are unchanged. The v1.84.0-rc.1 tag (cut before the switch) keeps the legacy form for historical continuity; every tag from v1.84.0 onward uses the PEP 440 form.

    Heads up — large bundle of behavioral changes.

    This release consolidates a lot of reliability and hardening work that shipped in tight sequence. The Important Behavior Changes section below covers everything that changes a default, removes a configuration shortcut, or alters a request/response shape, with the opt-out you need to keep prior behavior. Read that section before upgrading a production deployment. If you already validated against v1.84.0-rc.1, see the Changes since v1.84.0-rc.1 section for the post-rc delta.

    Key Highlights

    • Pass-through endpoints are authenticated by default. The auth field on entries under general_settings.pass_through_endpoints now defaults to true. The previous "OSS gets unauthenticated forwarders by default; auth: true is enterprise-only" combination is gone — auth: true works on OSS, and operators who want an unauthenticated forwarder must set auth: false explicitly.
    • Multi-pod budget enforcement is materially more accurate. RedisCache.async_increment gains a refresh_ttl opt-in, spend counters opt into it, and stale in-memory counters are skipped on a clean Redis miss. ResetBudgetJob invalidates Redis counters alongside DB resets so refreshed counters get reset too.
    • Prisma DB reconnects no longer freeze the event loop. The reconnect path replaced await self.db.disconnect() (which called subprocess.Popen.wait() synchronously) with a SIGTERM→SIGKILL → fresh Prisma() + connect() sequence. Liveness probes stop failing during database flaps. Companion fix restores reconnect-and-retry on PrismaClient.get_generic_data.
    • Memory footprint down ~700 MB on a two-worker Docker deployment via lazy-loaded feature routers and lazy-loaded front page. First request to a lazy route incurs the import cost; subsequent requests are unchanged.
    • MCP OAuth + Azure Entra discovery support, opt-in short-ID tool prefix to keep MCP tool names under the 60-char limit, and OAuth root-endpoint visibility now matches explicit server-name lookup.
    • Durable agent workflow run tracking via a new /v1/workflows/runs REST surface backed by LiteLLM_WorkflowRun / LiteLLM_WorkflowEvent / LiteLLM_WorkflowMessage tables. Spend logs session_id joins for free cost attribution.
    • Per-model routing strategies via Routing Groups. New router_settings.routing_groups schema binds a list of model_names to its own routing strategy (e.g. latency-based-routing for gpt-4o, simple-shuffle for cheaper models) within a single router. Configurable in proxy_config.yaml or from the LiteLLM dashboard under General Settings → Routing Groups; UI-managed groups persist and override the YAML values.

    Changes since v1.84.0-rc.1

    Everything below landed on top of v1.84.0-rc.1 and is included in v1.84.0. If you already validated against the rc, this is the only delta to re-test.

    Hardening

    • /key/update authorization checks — PR #27878
    • /key/regenerate ownership-rebind + premium-gate guards — PR #27793
    • Reject bare strings at file-input sinks to prevent local-file reads via crafted request bodies — PR #27762
    • Refuse remote-URL instance-fn loads outside the config-file path — PR #27801
    • Cover extra_body + azure_ad_token in banned-params check — PR #27898
    • MCP BYOK / OAuth: block SSRF fields in RAG ingest vector_store config; block client-side pricing injection via request body — PR #27892

    Budget reservation

    • Bound budget reservation per request instead of pinning to the entire remaining team/key/user headroom on requests without max_tokens — PR #27509
    • Image generation: reserve per-image cost rather than max-tokens cost; gate strictly on model mode

    Health probes

    • Re-expose db status on the unauthenticated /health/readiness payload so external probes can distinguish DB-unreachable workers without auth — PR #27866
    • UI fetches litellm_version + is_detailed_debug from /health/readiness/details (auth-gated) since those fields were moved off the public payload — PR #27896
    • UI: disable retries on /health/readiness/details + cover token forwarding

    MCP

    • Forward configured extra_headers from the MCP client to upstream OpenAPI HTTP calls (closes #26794) — PR #27383
    • On the same forwarding path, static_headers now win over caller-forwarded extra_headers on name conflict (case-insensitive). See Important Behavior Changes → MCP below.

    Routing under SERVER_ROOT_PATH

    • Lazy-feature loading under a non-empty SERVER_ROOT_PATH no longer 404s on routes such as /api/v1/policies/attachments/list; strip the prefix before lazy-feature match and cache the normalized path at middleware init — PR #27812

    Tagging & metrics

    • ⚠️ Reverted the v1.83.10 caller-tag strip / allow_client_tags opt-in — caller-supplied tags merge into request metadata again; the strip is no longer enforced. See the new entry under Important Behavior Changes → Tags below for the full impact. — PR #27789
    • Point the /metrics 401 hint at the actual opt-out flag — PR #27505

    Packaging

    • Relax core runtime pins to ranges so downstream packages can resolve a single shared openai /etc. version — PR #27241
    • Raise jinja2 floor in [project.dependencies] to >=3.1.6 to match the lockfile — PR #27552

    ⚠️ Important Behavior Changes

    This release tightens a number of defaults across auth, ingress, callbacks, MCP, and the UI. Each item below names the change and, where applicable, the exact configuration you need to restore prior behavior.

    Auth & request ingress

    Pass-through endpoints default to auth: true

    What changed:
    PassThroughGenericEndpoint.auth now defaults to True. The runtime dispatch in user_api_key_auth.py reads endpoints as raw dicts, so endpoint.get("auth", True) applies even when the dict has no explicit key. The premium_user gate on auth: true was also removed — OSS deployments can now use auth: true.
    Who is affected:
    Any pass-through entry in general_settings.pass_through_endpoints that omitted auth:. Prior to this rc that meant unauthenticated; it now means LiteLLM-key-authenticated.
    Restore prior behavior:
    Set auth: false explicitly on every pass-through entry that is meant to be public (e.g. webhook receivers).

    Clientside api_base / base_url are gated and credential-stripped

    What changed:
    (i) Clientside api_base / base_url are validated against validate_url when litellm.user_url_validation is enabled.
    (ii) When a request redirects api_base / base_url, admin-configured provider credentials and per-deployment metadata (OCI signing keys, AWS / Azure / Vertex tokens, observability vars, every field on CredentialLiteLLMParams) are dropped before the call is forwarded.
    (iii) The provider-inference matcher in get_llm_provider_logic.py no longer does an unanchored substring match — it now compares parsed URL hostname + segment-bounded path prefix.
    (iv) The blocklist for clientside-overridable params adds aws_bedrock_runtime_endpoint, langsmith_base_url, langfuse_host, posthog_host, braintrust_host, slack_webhook_url, s3_endpoint_url, sagemaker_base_url, deployment_url. The old "blocklist is a no-op when api_key is non-empty" clause is removed.
    Who is affected:
    Anyone passing api_base (or any of the newly-blocked fields) at request time and relying on the implicit-api_key bypass to thread it through.
    Restore prior behavior:
    Use the documented BYOK paths instead of the bypass: Proxy-wide: general_settings.allow_client_side_credentials: true; Per deployment: litellm_params.configurable_clientside_auth_params: ["api_base", ...]. The 400 returned by the proxy on a blocked request names the offending field and points at the same two settings.

    Master-key requests now propagate an alias instead of the master-key hash

    What changed:
    When a request authenticates with the master key, the UserAPIKeyAuth.api_key / token value handed to downstream code is now the constant LITELLM_PROXY_MASTER_KEY_ALIAS = "litellm_proxy_master_key". The cache lookup is unchanged (still keyed on hash_token(master_key)). _is_master_key no longer accepts the SHA-256 hash form — only the raw master key.
    Who is affected:
    Anything joining or filtering on the prior master-key hash value, including custom dashboards over spend logs and Prometheus /metrics queries pinned to the hash literal.
    Restore prior behavior:
    None — operators querying spend logs or metrics for master-key activity should switch their filter to the alias "litellm_proxy_master_key".

    Invite-link onboarding no longer mints a key from GET

    What changed:
    GET /onboarding/get_token returns a 15-minute signed onboarding JWT bound to invite + user id; it does not mint a sk-... virtual key. POST /onboarding/claim_token requires that JWT and atomically reserves the invite via update_many(... is_accepted=False, ... → True).
    Who is affected:
    Any tooling that consumed GET /onboarding/get_token for an embedded sk-... and treated it as a usable session key before completing the password claim.
    Restore prior behavior:
    None — clients must call POST /onboarding/claim_token to obtain the live key.

    CLI SSO login flow uses a server-side session

    What changed:
    litellm-proxy login now starts a CLI SSO flow that returns a login id + polling secret + terminal verification code. The browser callback must confirm the terminal code before the polling endpoint returns the JWT.
    Who is affected:
    Anyone running an older litellm-proxy CLI against an upgraded proxy — the old caller-supplied-handle handoff is gone.
    Restore prior behavior:
    None — upgrade the CLI alongside the proxy.

    Team self-join (_is_available_team) only allows self-add as role=user

    What changed:
    /team/member_add: when the caller is not an admin and the team is "available," the request must add only the caller themselves with role="user". Bulk shapes are checked the same way; lists mixing a valid self-entry with a role="admin" entry are rejected. Email-only members on the self-join path are rejected.
    /team/permissions_update: the _is_available_team clause is removed entirely — only proxy/team/org admins can update team_member_permissions.
    Who is affected:
    Any flow that relied on the blanket bypass to either add an admin to an available team without admin privileges, or to mutate team_member_permissions from a non-admin context.
    Restore prior behavior:
    None — perform admin-scoped operations with an admin key.

    Guardrail modification permission gates on key presence

    What changed:
    The guardrail-modification authz check in auth_checks.py now gates on intent (whether the key is present in the request) rather than payload truthiness. Some previously-accepted shapes will now 403.
    Restore prior behavior:
    None — flow updates required for non-admin callers that previously slipped past on falsy payloads.

    Untrusted root control fields are stripped from client requests

    What changed:
    _UNTRUSTED_ROOT_CONTROL_FIELDS in litellm_pre_call_utils.py includes mock_response, mock_tool_calls, redaction-bypass controls, and a few others. They are stripped from client requests unless the calling key/team carries allow_client_mock_response: true (for mock_response / mock_tool_calls) or the corresponding admin-opt-in metadata for the redaction bypass. Pillar guardrail caching headers and Bedrock dynamic evaluation overrides are also filtered when not explicitly allowed.
    Who is affected:
    Tests and tooling that pass mock_response / mock_tool_calls in extra_body to short-circuit completions.
    Restore prior behavior:
    Set allow_client_mock_response: true in the admin metadata of the test key (or the team owning it).

    Error responses no longer leak re-raised local parameters

    What changed:
    Broad except handlers in the response-utils path used to render the captured request parameters into the re-raised error message. Those parameters can carry credentials, so they're now dropped from the rendered message.
    Who is affected:
    Any client that parsed credential-shaped fields out of a 5xx error body. The error response shape is otherwise unchanged.
    Restore prior behavior:
    None.

    Vector stores

    Credentials redacted; /vector_store/update is per-store gated

    What changed:
    /vector_store/list, /vector_store/info, /vector_store/update redact credential-bearing values inside the persisted litellm_params (handles dicts, JSON-string-serialized params, and nested-dict shapes like litellm_embedding_config).
    /vector_store/update is now gated by _fetch_and_authorize_vector_store — same per-store access check /vector_store/info already had.
    SensitiveDataMasker adds plural "credentials" to its default sensitive-pattern set, so segment-exact matching catches vertex_credentials, aws_credentials, etc. (Latent fix that affects every default-instantiated masker, not just vector stores.)
    get_vector_store_info and update_vector_store re-raise HTTPException instead of letting the catch-all downgrade 403 / 404 to 500.
    Who is affected:
    Anything reading litellm_params off these responses to recover provider keys, or any non-store-admin caller mutating arbitrary vector stores via /vector_store/update.
    Restore prior behavior:
    None.

    Logging callbacks & key/team metadata

    os.environ/* callback refs in key/team metadata are no longer resolved

    What changed:
    convert_key_logging_metadata_to_callback() no longer resolves os.environ/* values from key/team metadata via get_secret(). Existing rows with such values are silently ignored at request setup instead of crashing the request. Trusted config.yaml team-callback env resolution in add_team_based_callbacks_from_config() is unchanged. New AddTeamCallback constructions from key/team logging metadata also reject os.environ/* callback vars.
    Who is affected:
    Any key/team that stored os.environ/DATABASE_URL (or similar) in its callback metadata to pick up a server env var at request time.
    Restore prior behavior:
    Configure those callback secrets through trusted proxy config.yaml (team_callbacks / model_list[].litellm_params) instead of putting os.environ/ references in DB-backed key or team metadata. The literal credential value can still be stored in metadata if absolutely necessary.

    Team-callback admin mutations now emit audit logs

    What changed:
    POST /team/{id}/callback (add_team_callbacks) and POST /team/{id}/disable_logging (disable_team_logging) emit LiteLLM_AuditLogs rows when litellm.store_audit_logs=True. Additive when audit logging is enabled.
    Restore prior behavior:
    litellm.store_audit_logs: false (the default) suppresses the new rows.

    MCP

    Encrypted user-scoped MCP credentials at rest

    What changed:
    Writes to LiteLLM_MCPUserCredentials.credential_b64 go through encrypt_value_helper (nacl SecretBox) instead of plain urlsafe_b64encode. The read path tries nacl decryption first and falls back to plain urlsafe_b64decode for legacy rows; existing rows stay readable.
    Who is affected:
    Operators reading the table directly; the column contents change shape on first re-write.
    Restore prior behavior:
    None — backward-compat read path keeps legacy rows working until they are next written.

    OAuth metadata discovery follows SSRF guard

    What changed:
    The two URLs MCP discovery follows (resource_metadata from WWW-Authenticate, and authorization_servers[0] from protected-resource-metadata) are now subject to async_safe_get. Same-authority metadata fetches stay direct (with follow_redirects=False); cross-origin fetches are validated via the existing user URL validation policy. Public federated providers (Azure Entra, Google, Okta, GitHub) remain supported.
    Who is affected:
    Cross-origin internal/loopback/cloud-metadata OAuth metadata URLs.
    Restore prior behavior:
    Toggle litellm.user_url_validation and the existing URL validation controls per the proxy URL-validation docs to permit your specific internal targets.

    MCP public-route detection no longer matches query strings; OAuth2 fallback no longer fail-opens

    What changed:
    MCPRequestHandler.process_mcp_request checks request.url.path.startswith("/.well-known/") instead of ".well-known" in str(request.url). Query-string smuggling like ?.well-known is rejected.
    When an Authorization header fails LiteLLM-key validation, the handler no longer treats the failure as "OAuth2 passthrough" and returns an empty UserAPIKeyAuth().
    Restore prior behavior:
    None.

    MCP OAuth root endpoint resolves with request visibility rules

    What changed:
    Root-endpoint fallback resolves the single OAuth2 server using the same visibility rules as explicit server-name lookup; non-visible servers are no longer selected via the fallback path. The callback redirect path validates the full client redirect URI carried in state and appends parameters without dropping an existing query string.
    Restore prior behavior:
    None — adjust server visibility rather than relying on the fallback.

    OpenAPI MCP: static_headers now win over caller-forwarded extra_headers

    What changed:
    v1.84.0 introduced header forwarding for OpenAPI-backed MCP servers (spec_path: configs) via PR #27383, letting you allowlist caller request headers into upstream OpenAPI HTTP calls. When the same header name appears in both your YAML static_headers and the request-time extra_headers allowlist, the static_headers value now wins, with case-insensitive name comparison so X-Tenant-Id and x-tenant-id are treated as the same header. This matches how the managed MCP path has always behaved. Authorization is still overridden last by a BYOK x-mcp-auth token, if present.
    Example:
    With mcp_servers:
    data_api:
    spec_path: http://upstream-api.local/openapi.json
    static_headers:
    X-Tenant-Id: "acme-corp"
    extra_headers:
    - X-Tenant-Id

    a caller sending X-Tenant-Id: evil-corp will now have X-Tenant-Id: acme-corp sent upstream. Any header in extra_headers that does not collide with static_headers is still forwarded unchanged.
    Who is affected:
    Operators who set the same header name in both static_headers and extra_headers on an OpenAPI MCP server, and who were relying on the caller's value taking effect. (Note: this only ever shipped in the v1.84.0 release-candidate cycle — no prior stable release forwarded extra_headers for OpenAPI MCPs at all.)
    Restore prior behavior:
    None — if you actually want the caller to control a header, remove it from static_headers and keep it only in extra_headers, or use distinct names for the operator-pinned value and the caller-supplied value.

    UI / static assets

    /get_image, /get_favicon, /get_logo_url

    What changed:
    Remote HTTP(S) UI_LOGO_PATH / LITELLM_FAVICON_URL are now browser-loaded via redirect — the proxy no longer fetches them server-side from these unauthenticated endpoints.
    Local file paths still work in place, but the resolved file must have a supported image signature (jpeg, png, gif, webp, ico); non-image paths fall back to the bundled default.
    /get_logo_url only returns HTTP(S) values; local filesystem paths are not disclosed.
    Stale cached_logo.jpg files are no longer served by /get_image.
    Who is affected:
    Custom branding setups that pointed UI_LOGO_PATH / LITELLM_FAVICON_URL at non-image local files, or relied on /get_logo_url to surface a local path.
    Restore prior behavior:
    No new env vars required. Existing remote URLs continue to work; local image paths continue to work as long as the file is a recognized image type.

    /ui/chat removed

    What changed:
    Static chat.html / chat.txt / chat/ are gone; the route 404s. The chat UI was already removed from the nav; the dangling static build is now also gone.
    Restore prior behavior:
    None.

    "Store Prompts in Spend Logs" toggle moved to Admin Settings

    What changed:
    Both "Store Prompts in Spend Logs" and "Maximum Spend Logs Retention Period" moved from a gear-icon modal on the Logs page to Admin Settings → Logging Settings. The gear was visible to non-admins and surfaced 403s on save.
    Restore prior behavior:
    None — controls are admin-only as /config/update and /config/list already required.

    Tags

    ⚠️ Reverted: v1.83.10 caller-tag strip / allow_client_tags opt-in

    What changed:
    This release reverts the v1.83.10 breaking change that stripped caller-supplied tags unless the key/team metadata had allow_client_tags: true. Caller-supplied tags from x-litellm-tags, body-level tags, and metadata.tags now flow into metadata.tags again and union with admin-configured static tags from key/team/project metadata — the proxy's behavior is back to what it was before v1.83.10. The pre-call strip block in litellm_pre_call_utils.py is removed, and the flag has no schema or endpoint footprint, so leftover allow_client_tags: true values on existing keys/teams are inert.
    Who is affected:
    Operators who set metadata.allow_client_tags: true on keys/teams to opt into client tags: the flag is now a no-op and can be cleaned up at leisure.
    Operators who relied on the v1.83.10 strip to block client-supplied tags reaching tag-based routing or tag-based spend attribution: the strip is no longer enforced. Re-evaluate your tag-based routing and cost-attribution exposure before upgrading.
    Restore prior behavior:
    None — the strip path is gone from the proxy. If caller-supplied tags must be blocked, filter them upstream (gateway / ingress) or in a custom pre-call hook.

    New Models / Updated Models

    New Model Support (16 new models) including OpenAI gpt-image-2, Azure OpenAI azure/gpt-image-2, AWS Bedrock zai.glm-5, Crusoe models, Vertex AI grok models, and others with various features like vision, pdf input, function calling, reasoning, tool choice.

    New Providers (2 new providers)

    AIHubMix: OpenAI-compatible chat completions
    Crusoe: chat completions across reasoning / instruct catalogs

    Pricing updates

    OpenAI gpt-5.5-pro corrected pricing: was 2× OpenAI's published rate. Cost-tracking output for gpt-5.5-pro will drop to half what it reported under previous releases — operators reconciling spend reports across the upgrade boundary should expect the discontinuity. - PR #26651

    AWS Bedrock Anthropic Claude 4.5 / 4.6 / 4.7 (Global + US) — added cache_creation_input_token_cost_above_1hr (and the _above_200k_tokens LC variant for Sonnet 4.5). 1-hour-TTL prompt-cache writes on Bedrock now bill at the published 1.6× rate instead of falling back to the 5-minute rate (was undercounting by ~60%). - PR #26800

    Features

    • Bedrock: Preserve cache_control TTL on tools for Claude 4.5+ on the Converse path; sanitize tools blocks on the Invoke path - PR #25855
    • Translate OpenAI file content on the tool-result path (Bedrock Converse + direct Anthropic) - PR #26710
    • retrievalConfiguration passthrough for vector-store search via extra_body - PR #26685
    • Vertex AI: Propagate metadata labels to embeddings (labels), Imagen (labels), and Discovery Engine rerank (userLabels); shared helper across paths - PR #25499
    • Reuse Anthropic-messages config instances via @lru_cache so VertexBase credential cache survives across calls - PR #26099
    • Google Native: Emit LiteLLM proxy success headers (x-litellm-*) on :generateContent and :streamGenerateContent - PR #25500
    • Run pre_call_hook on :generateContent / :streamGenerateContent so guardrails fire - PR #26914
    • Anthropic: JSON response_format + user tools on non-streaming: filtered tool calls + structured JSON merged into content; internal json_tool_call no longer surfaces - PR #26222
    • Ollama: Forward tool_calls on assistant messages and tool_call_id on role: tool messages — fixes the infinite tool-call loop on multi-turn agents - PR #26122
    • Predibase: Migrate transform_request / transform_response into transformation.py (refactor, no behavior change) - PR #25249
    • AIHubMix (new): First-class OpenAI-compatible provider entry - PR #24294

    Bug Fixes

    • Vertex AI: Preserve items on the array branch of anyOf schemas with null (Vertex was rejecting INVALID_ARGUMENT) - PR #26675
    • Bedrock: GET /v1/batches/{batch_id} forwards model from the encoded id (was returning LiteLLM doesn't support bedrock for 'create_batch') - PR #26814
    • Pass-through stream interruption now flushes spend tracking — GeneratorExit from client disconnect was dropping per-chunk usage values - PR #26719
    • Replace deprecated Claude 3.7 Sonnet test references with claude-sonnet-4-5-20250929-v1:0 across 16 test files - PR #26721
    • Router custom pricing: Propagate custom cost_per_token from DB model_info through the fallback path - PR #25888
    • Responses API: DELETE /openai/responses/{id} no longer sends json={} — Azure now rejects the empty {} body with unexpected_body - PR #26949
    • Pass-through endpoints: Invoke post-call guardrails on non-streaming pass-through responses (/vertex_ai/, /openai/, /bedrock/*); opt-in only when guardrails are configured for the route - PR #26262
    • Inherit caller identity from litellm_params metadata when fabricating UserAPIKeyAuth for managed-files passthrough batch creation (Anthropic + Vertex AI) - PR #26831
    • Embedding cache: Preserve prompt_tokens_details (incl. image_count) through the cache round-trip; aggregate per-item details on retrieval; merge in combine_usage() for partial cache hits - PR #26653
    • Streaming logging: Backfill streaming hidden response cost into the success log path - PR #26606
    • Cost calculation: Unify success_handler typed + dict branches so spend rows stop logging 0 and the budget-overrun reports it caused - PR #26629

    Management Endpoints / UI

    • Teams: Team-level search-tool credentials: new search_tools array on LiteLLM_ObjectPermissionTable; per-key permissions validated as a subset of the owning team's; UI selector under team management - PR #26691
    • Routing Groups: New General Settings → Routing Groups page: create, edit, and delete per-model routing strategies from the dashboard without editing proxy_config.yaml. UI-managed groups are persisted and override values defined in YAML; per-group state is rebuilt on save - PR #27131
    • Model Health: Pagination controls on the model health status page - PR #26826
    • CLI / Workers: --timeout_worker_healthcheck CLI flag (env TIMEOUT_WORKER_HEALTHCHECK) — forwards to uvicorn 0.37.0+ Config kwarg; older uvicorn = warning + no-op; gunicorn / hypercorn paths untouched - PR #26622
    • Memory / lazy loading: Lazy-load optional feature routers on first request (~700 MB lower memory on a two-worker Docker deployment) - PR #26534
    • Lazy-loaded openapi.json front page; spec generation moved to CI with a runtime stub fallback - PR #26802
    • Background jobs: Cleanup job for expired LiteLLM dashboard session keys - PR #26460
    • MCP OAuth: Azure Entra discovery endpoint support - PR #26584

    MCP UI: Tool Configuration panel on the MCP server edit page switched from POST /mcp-rest/test/tools/list (temp-session preview, requires inline creds) to GET /mcp-rest/tools/list?server_id=... (stored credentials). Saved servers with auth_type of api_key / bearer_token / basic / authorization now load tools without "Unable to load tools — Failed to connect to MCP server." - PR #26002

    • Teams: Per-member rows with max_budget=NULL now fall through to team-level enforcement instead of silently disabling it - PR #26809
    • Spend logs: Strip request data from spend-log error messages - PR #26662
    • Vertex retrieve mocked tests: is_redirect=False set on mocked retrieve responses - PR #26844

    AI Integrations

    Logging

    General
    • Opt-in retry settings for the Generic API logger batch send — transient litellm.Timeout / httpx.ConnectTimeout failures retry instead of dropping the batch - PR #26645
    • Cache GCP IAM token used for Redis (was being regenerated per-connection; synchronous google-auth + google-cloud-iam calls were freezing the asyncio event loop, causing ~25 s INCRBYFLOAT Redis spans in production) - PR #26441
    • Backfill streaming hidden response cost - PR #26606

    Guardrails

    • CyCraft XecGuard (new): First-class partner guardrail. Multi-policy prompt/response scanning (prompt injection, harmful content, PII, system-prompt enforcement, bias, skills protection) plus RAG context-grounding via /grounding - PR #26011
    • Noma v2: _build_scan_payload no longer crashes during post_call / during_call / during_mcp_call on deepcopy(request_data) failures with unserializable objects (e.g. uvloop.Loop) - PR #26605
    • Pass-through: Post-call guardrails on non-streaming pass-through responses (see LLM API Endpoints) - PR #26262

    Spend Tracking, Budgets and Rate Limiting

    • Multi-pod budget enforcement
      RedisCache.async_increment gains refresh_ttl opt-in (used by spend counters); get_current_spend and SpendCounterReseed.coalesced skip stale per-pod in-memory on a clean Redis miss; ResetBudgetJob invalidates the Redis counter alongside every DB row reset (keys, users, teams, team members, budgets-linked keys) - PR #26829
    • Cost calc unification
      success_handler typed + dict branches now compute cost the same way - PR #26629
    • Per-member null budget
      Per-member rows with max_budget=NULL fall through to team enforcement - PR #26809
    • Bedrock 1-hour cache write pricing
      Claude 4.5 / 4.6 / 4.7 Global + US entries gain cache_creation_input_token_cost_above_1hr (was undercounting ~60%) - PR #26800
    • gpt-5.5-pro corrected pricing
      Was double-priced - PR #26651
    • Bedrock pass-through stream interruption
      Spend tracking now flushes when client disconnects mid-stream - PR #26719

    MCP Gateway

    Tool prefix
    • Opt-in LITELLM_USE_SHORT_MCP_TOOL_PREFIX env var: switches per-tool prefix from the human-readable server name (github_onprem-get_repo) to a deterministic 3-char base62 id derived from server_id (Xy7-get_repo). Lets long server names stay under the 60-char tool-name limit some model APIs enforce - PR #26733
    OAuth
    • Azure Entra discovery endpoint support - PR #26584

    See Important Behavior Changes for public-route detection, OAuth root endpoint visibility, OAuth metadata SSRF guard, and user-scoped credential encryption.

    Performance / Loadbalancing / Reliability improvements

    • Routing Groups (per-model strategies)
      New router_settings.routing_groups schema binds a list of model_name s to its own routing_strategy and optional routing_strategy_args; ungrouped models fall back to the top-level routing_strategy (the implicit default group, name reserved). Each model_name may belong to at most one group — overlap raises ValueError at init. Updatable at runtime via Router.update_settings(routing_groups=[...]) or /config/update; per-group state is rebuilt on update - PR #27022
    • Database reconnect
      Prisma reconnect no longer blocks the asyncio event loop. Replaces await self.db.disconnect() (which calls subprocess.Popen.wait() synchronously and freezes the loop for 30–120 s+ in production, failing K8s liveness probes) with SIGTERM → 0.5 s sleep → SIGKILL → fresh Prisma() + connect(). Direct-reconnect path delegates to recreate_prisma_client - PR #26225
    • call_with_db_reconnect_retry helper centralizes the reconnect-and-retry-once pattern. Restores the self-heal that 1.83.x lost on PrismaClient.get_generic_data (issue #25143) and harden the reconnect state machine - PR #26756
    • Redis IAM token caching
      GCP IAM token is no longer regenerated on every Redis connection; a single Redis INCRBYFLOAT was taking 25.6 s on a 28.4 s trace in production - PR #26441
    • Config caching
      DualCache config parameter reads are cached and batched. End-to-end on Docker, read load drops from 2.8 q/s to 0.7 q/s; improvement scales with pod count. Note: config edits will take longer to propagate (until the cache is invalidated) - PR #26469
    • Memory footprint
      Lazy-loaded feature routers - PR #26534
    • Lazy-loaded front page + openapi.json move-to-CI - PR #26802
    • Connection layer
      Optional TCP SO_KEEPALIVE support on aiohttp's TCPConnector - PR #26730
    • CLI
      --timeout_worker_healthcheck flag for uvicorn worker triage (see Management Endpoints) - PR #26622
    • Test stability
      Scope test_model_alias_map ERROR-log assertion to LiteLLM logger so asyncio records (e.g. Unclosed client session) stop flunking the assertion intermittently - PR #26741
    • Replace lazy-load subprocess startup-import diff with static source scan (~13 s instead of timing out past two minutes) - PR #26934
    • Opt model-access E2E tests into allow_client_mock_response: true after the request-control hardening - PR #26941

    General Proxy Improvements

    CI / Tooling

    • Support CircleCI "Rerun failed tests" for local_testing_part1 / local_testing_part2 / litellm_router_testing jobs (was collecting 0 items + exit 123) - PR #26461
    • Correct min-release-age value in .npmrc files: drop the d suffix to keep npm install from crashing on npm 11.x with RangeError: Invalid time value - PR #26850

    Pull request template

    • Add Linear ticket field for internal contributors - PR #26655

    New Contributors

    • @xinrui-z made their first contribution in #24294
    • @Jerry-SDE made their first contribution in #25249
    • @Zerohertz made their first contribution in #25888
    • @clyang made their first contribution in #26011
    • @mverrilli made their first contribution in #26122
    • @tuhinspatra made their first contribution in #26262
    • @omriShukrun08 made their first contribution in #26605
    • @lmcdonald-godaddy made their first contribution in #26651
    • @minznerjosh made their first contribution in #26710
    • @yassinkortam made their first contribution in #26730
    • @sruthi-sixt-26 made their first contribution in #26814

    Full Changelog: https://github.com/BerriAI/litellm/compare/v1.83.14-stable...v1.84.0

    Original source
  • Apr 27, 2026
    • Date parsed from source:
      Apr 27, 2026
    • First seen by Releasebot:
      May 23, 2026
    liteLLM logo

    liteLLM

    v1.83.14 - GPT-5.5, Prompt Compression & Memory API

    liteLLM adds GPT-5.5 and GPT-5.5 Pro support, server-side prompt compression, new memory CRUD endpoints, LLM-as-a-Judge guardrails, MCP OAuth hardening, per-member team budgets, and adaptive routing, alongside broad model updates, bug fixes, and proxy reliability improvements.

    Key Highlights

    • Day-0 GPT-5.5 and GPT-5.5 Pro support — OpenAI and Azure variants ship with full pricing maps, dated snapshots, and Responses-mode routing for the Pro tier.
    • Server-side Prompt Compression — first-class proxy callback that transparently compresses long-context inputs (Claude Code, RAG, document workloads) before they hit the upstream model, with no client opt-in required.
    • /v1/memory CRUD endpoints — proxy now exposes a memory store API with Prisma-backed metadata, consumed by the new agent loop.
    • LLM-as-a-Judge guardrail — model-graded post-call guardrail with configurable rubrics, joining the Bedrock / Lakera / Presidio / Noma family.
    • MCP OAuth hardening — discoverable + BYOK authorize/token endpoints are tightened, temporary OAuth sessions are now shared across proxy instances via Redis, and per-server access policy is uniformly enforced across the proxy and broker.
    • Per-member team budgets land in production — individual member budgets, per-member cycle surfacing in the Teams UI, and atomic counter alignment for user/org spend checks.
    • Adaptive routing — opt-in router policy that weights deployments by recent latency/error history on top of the existing wildcard fallback.

    New Models / Updated Models (22 new models) include OpenAI GPT-5.5 and GPT-5.5 Pro variants with 1,050,000 token context windows and updated pricing, Azure OpenAI variants, AWS Bedrock models, Moonshot, OpenRouter, Gemini embeddings, DashScope image generation, and others.

    Features include Bedrock additions (GLM-5, Minimax M2.5, Claude Mythos Preview), OpenAI versioned GPT-5.4 mini/nano snapshots and GPT-5.5 support, Azure OpenAI dated variants, Gemini Embedding 2 GA, Vertex AI multi-region hosts, DashScope image generation support, Moonshot model registry additions, Anthropic model migrations, and general improvements such as migrating 38 models from legacy max_tokens to max_input_tokens/max_output_tokens.

    Bug Fixes cover Anthropic input args preservation, Gemini thought suffix stripping, file content block handling, Azure streaming role preservation, Bedrock content block sorting and pricing fixes, Gemini embedding request filtering, Vertex AI dimension forwarding, Zhipu/GLM finish_reason mapping, OVHcloud tool calling fix, Scaleway audio support, Responses API normalization, Anthropic Messages API logging preservation, Image API multipart enforcement and URL fetch alignment, Vector Stores BYOK key injection restoration and permission respect, Memory API metadata JSONification, and general URL construction hardening.

    Management Endpoints/UI improvements include virtual keys/auth enhancements, UI tab additions and toggles, sortable columns, per-member budget cycle surfacing, project management refactor, and bug fixes tightening authorization and metadata handling.

    AI Integrations improvements include logging additions (litellm_call_id, Vertex AI passthrough logs) and guardrails enhancements (Bedrock OUTPUT source usage, post-call log deduplication, hook mode redaction, LLM-as-a-Judge guardrail shipping, team/global policy guardrails, guardrail param handling, streaming post-call logging, and deferred success log suppression).

    Spend Tracking, Budgets and Rate Limiting updates include per-member budgets, rate limiting reseed enforcement, and budget window reset fixes.

    MCP Gateway improvements include OAuth hardening, session sharing via Redis, access control alignment, permission resolution, route splitting, and tool filtering.

    Performance, Loadbalancing, and Reliability improvements include adaptive routing, wildcard fallback enhancements, server-side prompt compression callback, health/readiness fix, and developer ergonomics with uvicorn hot reload flag.

    General Proxy Improvements cover build/docker streamlining, migration opt-in resolver, CI/infra migrations and cleanups, test stability improvements, packaging/dependency bumps, UI fetch button fixes, and miscellaneous code improvements.

    Documentation Updates include observability integration additions, proxy docs clarifications, Gemini 3 defaults and release notes, fenced code block padding alignment, prompt caching doc updates, and repository pointing.

    New Contributors section lists multiple first-time contributors with links to their contributions.

    Full Changelog: https://github.com/BerriAI/litellm/compare/v1.83.10-stable...v1.83.14-stable

    Original source
  • Apr 27, 2026
    • Date parsed from source:
      Apr 27, 2026
    • First seen by Releasebot:
      May 23, 2026
    liteLLM logo

    liteLLM

    v1.83.10 - Claude Opus 4.7, Prompt Compression & Multi-Window Budgets

    liteLLM adds day-0 Claude Opus 4.7 support, launches BM25-based prompt compression, and expands budgeting with multi-threshold alerts and concurrent budget windows. It also introduces PromptGuard guardrails, per-team guardrail opt-out, and a switch to uv packaging for faster builds.

    Key Highlights

    • Claude Opus 4.7 day-0 support — Opus 4.7 across Anthropic, Bedrock, Vertex AI, Azure AI, and Perplexity, with reasoning, vision, prompt caching, computer use, and 1M-token context.
    • litellm.compress() — BM25-based prompt compression with a retrieval tool for trimming long context before it hits the model.
    • Multi-Threshold Budget Alerts — virtual keys can fire alerts at multiple configurable spend thresholds (e.g. 50% / 80% / 95%) instead of a single soft-budget level.
    • Concurrent Budget Windows — keys and teams can run multiple budget periods (daily + monthly) simultaneously, each with its own reset cadence.
    • Per-Team Guardrail Opt-Out — teams can opt out of specific global guardrails from team settings without touching config files.
    • PromptGuard Guardrail Integration — first-class pre/post-call guardrail for prompt-injection detection.
    • uv Packaging Migration — Poetry replaced by uv across packaging, CI, and Docker for faster, reproducible builds.

    Breaking Changes

    Caller-supplied tags are stripped unless the key/team opts in

    • What changed: Tags supplied by the caller — metadata.tags, litellm_metadata.tags, root-level tags, and the x-litellm-tags header — are stripped from the request before tag-based routing and tag-based spend attribution run, unless the calling key or its parent team carries metadata.allow_client_tags: true. Tags configured on the model deployment, key metadata, or team metadata are unaffected. The proxy logs a WARNING line on each strip:

      Stripped caller-supplied tags from metadata, tags (root): this key/team does not have allow_client_tags: true in its metadata. Set it to opt into client-supplied routing/budget tags.

      — PR #25905

    • Who is affected: Any deployment that relied on clients passing tags in the request body or x-litellm-tags header for tag-based cost tracking, tag budgets, or tag-based routing. After upgrade, those tags will silently fall through to the default bucket / default deployment, and per-tag spend reports will appear empty.

    • Restore prior behavior: Set allow_client_tags: true in the metadata of the affected key (or the team owning it). Either flag is sufficient — if the key or its parent team carries the flag, caller-supplied tags pass through.

    # Per key
    curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
    -H 'Authorization: Bearer sk-1234' \
    -H 'Content-Type: application/json' \
    -d '{"metadata": {"allow_client_tags": true}}'
    
    # Per team
    curl -L -X POST 'http://0.0.0.0:4000/team/new' \
    -H 'Authorization: Bearer sk-1234' \
    -H 'Content-Type: application/json' \
    -d '{"metadata": {"allow_client_tags": true}}'
    

    Existing keys/teams can be patched with /key/update or /team/update carrying the same metadata payload.

    os.environ/… values in the UI or API

    • What changed: Values such as os.environ/OPENAI_API_KEY (and other os.environ/… patterns) are no longer expanded when they come from request-supplied fields—including the Admin UI and the same proxy APIs the UI calls. — PR #25592
    • Who is affected: Anyone who entered literal os.environ/SECRET_NAME strings in the UI or API and expected the proxy to substitute the host environment at runtime.
    • What to use instead: Provider API keys and similar secrets should be stored with Reusable Credentials and attached to models (for example via litellm_credential_name). For observability callbacks (Langfuse, LangSmith, etc.), set keys and endpoints in proxy config.yaml or in environment variables the process reads at startup—not as os.environ/… strings inside per-request metadata.

    New Models / Updated Models

    New Model Support (10 new models)

    Provider Model Context Window Input ($/1M tokens) Output ($/1M tokens) Features Anthropic claude-opus-4-7, claude-opus-4-7-20260416 1M $5.00 $25.00 Chat, reasoning, vision, computer use, prompt caching, PDF input, xhigh reasoning effort AWS Bedrock anthropic.claude-opus-4-7, us.anthropic.claude-opus-4-7, eu.anthropic.claude-opus-4-7, au.anthropic.claude-opus-4-7, global.anthropic.claude-opus-4-7 1M $5.50 $27.50 Chat, reasoning, vision, computer use, prompt caching, PDF input, native structured output Vertex AI vertex_ai/claude-opus-4-7, vertex_ai/claude-opus-4-7@default 1M $5.00 $25.00 Chat, reasoning, vision, computer use, prompt caching, PDF input Azure AI azure_ai/claude-opus-4-7 200K $5.00 $25.00 Chat, reasoning, vision, computer use, prompt caching, PDF input Perplexity perplexity/anthropic/claude-opus-4-7 - - - Web search, function calling (Responses mode) Google Gemini gemini/veo-3.1-lite-generate-preview 1024 - $0.05 / sec Video generation preview OpenRouter openrouter/google/gemini-3.1-flash-lite-preview 1.05M $0.25 $1.50 Chat, code execution, file search, function calling, prompt caching, reasoning, web search, vision, video/audio/PDF input xAI xai/grok-4.20-0309-reasoning 2M $2.00 $6.00 Function calling, reasoning, tool choice, vision, web search W&B Inference wandb/MiniMaxAI/MiniMax-M2.5 197K $0.30 $1.20 Function calling, reasoning, response schema W&B Inference wandb/moonshotai/Kimi-K2.5 262K $0.60 $3.00 Function calling, reasoning, response schema, vision

    Features

    Anthropic

    • Day-0 support for Claude Opus 4.7 across Anthropic native, Bedrock, Vertex AI, Azure AI, and Perplexity - PR #25867
    • Hotfix follow-ups for Opus 4.7 routing/version-string handling - PR #25875, PR #25876
    • Retry /v1/messages after invalid thinking signature errors - PR #25674

    AWS Bedrock

    • Normalize custom tool JSON schema for both Invoke and Converse APIs - PR #25396
    • Bedrock API response null-type handling - PR #25810, PR #24147
    • Prevent negative streaming costs for start-only cache usage - PR #25846
    • Accurate cache token cost breakdown in UI and SpendLogs - PR #25735
    • Remove unresolved merge conflict markers in Bedrock test file - PR #25995
    • Replace flaky Bedrock gpt-oss tool-call live test with request-body mock - PR #25739
    • Mock Bedrock Moonshot tests + fix TogetherAIConfig recursion - PR #25920
    • Remove dead Bedrock clear_thinking interleaved-thinking-beta assertion - PR #25913

    Google Vertex AI

    • Normalize Gemini finish_reason enum through map_finish_reason - PR #25337
    • Add us-south1 region for vertex_ai/qwen3-235b-a22b-instruct-2507-maas - PR #25382
    • Add vertex_ai/claude-opus-4-7 and vertex_ai/claude-opus-4-7@default cost map entries - cost map

    Google Gemini

    • Veo 3.1 Lite pricing, video resolution usage, and tiered cost tracking - PR #25348

    Azure AI

    • Add azure_ai/claude-opus-4-7 cost map entry - cost map
    • Populate standard_logging_object for Azure passthrough via logging hook - PR #25679

    OpenAI

    • Omit null encoding_format for OpenAI embedding requests - PR #25395 (later reverted in PR #25698)

    xAI

    • Add xai/grok-4.20-0309-reasoning cost map entry - PR #25930

    Together AI

    • Expose reasoning effort fields in get_model_info and add together_ai/gpt-oss-120b - PR #25263
    • Replace deprecated Mixtral with serverless Qwen3.5-9B in tests - PR #25728

    DashScope

    • Preserve cache_control for explicit prompt caching - PR #25331

    GitHub Copilot

    • Allow overriding the default GitHub Copilot authentication endpoint - PR #25915

    W&B Inference

    • Add Kimi-K2.5 and MiniMax-M2.5 cost map entries - PR #25409

    Bug Fixes

    Anthropic

    • Return actual upstream status code from /v1/messages/count_tokens instead of always 200 - PR #21352

    Vertex AI

    • Gemini finish_reason enum normalization (see Features above) - PR #25337

    Embeddings API

    • Revert null-encoding_format omission after downstream regression - PR #25698

    General

    • Fix version shown in docs banner - PR #25875

    LLM API Endpoints

    • Add Responses API params to cache key allow-list - PR #25673
    • OCR API: Mistral-style pages param via Azure DI analyze query string - PR #25929
    • Add missing Mistral OCR params to allowlist - PR #25858
    • OpenAI encoding_format handling for null values (initial fix later reverted) - PR #25395, PR #25698
    • Anthropic Messages: Retry on invalid thinking signature - PR #25674
    • Return actual status code on count_tokens upstream errors - PR #21352
    • Pass-Through Endpoints: Populate standard_logging_object for Azure passthrough - PR #25679
    • Restrict x-pass- header forwarding for credential and protocol headers - PR #25916

    Management Endpoints / UI

    Virtual Keys

    • Configurable multi-threshold budget alerts (e.g. 50% / 80% / 95%) - PR #25989
    • Multiple concurrent budget windows per API key and team (#24883) - PR #25109
    • Per-member model scope + team default_team_member_models - PR #24950
    • Migrate regenerate key modal to AntD - PR #25406
    • Strip empty premium fields from key update payload - PR #26023
    • Default invite-user modal global role to least privilege - PR #25721

    Teams

    • Allow editing router settings after team creation - PR #25398
    • Per-team opt-out for specific global guardrails - PR #25575
    • Enterprise notice banner on deleted Keys/Teams - PR #25814
    • Invalidate org queries after team mutations - PR #25812
    • E2E test for editing team model TPM/RPM limits - PR #25658

    Models + Endpoints

    • Claude Code BYOK support in UI Settings - PR #25998
    • E2E tests for Add Model flow - PR #25590
    • Pre-select backend default for boolean guardrail provider fields - PR #25700
    • Render guardrail optional_params bool defaults in Select - PR #25806
    • Use AntD Select for MCP ToolTestPanel boolean inputs - PR #25809
    • Persist extra_headers on MCP server edit - PR #26003
    • Migrate Guardrail Test Playground from @tremor/react to AntD - PR #25749
    • Migrate router_settings page from Tremor to AntD - PR #25879
    • Reduce Tremor usage in Guardrails Monitor layout - PR #25803
    • Remove Chat UI link from Swagger docs message - PR #25727
    • Delete policy attachments via controlled modal - PR #25324

    Auth / SSO

    • Resolve login redirect loop when reverse proxy adds HttpOnly to cookies - PR #23532
    • Gate post-custom-auth DB lookups behind opt-in flag - PR #25634

    Logs / Activity

    • Isolate logs team-filter dropdown from root teams state bleed - PR #25716
    • Align /spend/logs filter handling with user scoping - PR #25594

    Helm

    • Add tpl support to extraContainers and extraInitContainers - PR #25494

    Bugs

    • Strip empty premium fields from key update payload - PR #26023
    • Tighten api_key value check in credential validation - PR #25917
    • extra_headers not persisting on MCP server edit - PR #26003
    • Logs team-filter dropdown leakage - PR #25716
    • Add getCookie to cookieUtils mock in user_dashboard test - PR #25719
    • Remove deprecated tests/ui_e2e_tests/ suite - PR #25657
    • Restrict x-pass- header forwarding - PR #25916
    • Blog dark-mode text invisible on dark background - PR #25620
    • Default invite-user role least-privilege - PR #25721

    AI Integrations

    Logging

    • Prometheus: Add 7m and 10m latency histogram buckets - PR #25071
    • Performance improvements for Prometheus exporter - PR #25934
    • Resolve prometheus_helpers file/package shadow breaking /global/spend/logs - PR #26026

    Azure Pass-Through

    • Populate standard_logging_object via logging hook - PR #25679

    General

    • Preserve provider response headers in StandardLoggingPayload - PR #25807

    Guardrails

    PromptGuard

    • New PromptGuard guardrail integration for prompt-injection detection - PR #24268

    Custom Code Guardrails

    • Replace custom_code sandbox with RestrictedPython - PR #25818

    Presidio

    • Use correct text positions in anonymize_text - PR #24998

    General

    • Per-team opt-out for specific global guardrails - PR #25575
    • UI: pre-select backend default for boolean guardrail provider fields - PR #25700
    • UI: render guardrail optional_params boolean defaults in Select - PR #25806
    • Read guardrail config from admin metadata and fix tag-routing consistency - PR #25905

    Caching

    • Add Responses API params to cache key allow-list - PR #25673
    • Prevent multiple values TypeError in get_cache_key - PR #20261
    • S3v2: use prepared URL for SigV4-signed S3 requests - PR #25074

    Prompt Management / Compression

    • New litellm.compress() BM25-based prompt compression API with retrieval tool - PR #25637

    Secret Managers

    • No new secret manager provider additions in this release.

    Spend Tracking, Budgets and Rate Limiting

    • Configurable multi-threshold budget alerts for virtual keys (e.g. 50% / 80% / 95%) - PR #25989
    • Multiple concurrent budget windows per API key and team (#24883) - PR #25109
    • Bedrock/Anthropic accurate cache token cost breakdown in UI and SpendLogs - PR #25735
    • Bedrock: prevent negative streaming costs for start-only cache usage - PR #25846
    • Fix virtual-key projected-spend soft budget alerts - PR #25838
    • Enforce project-level model-specific rate limits in parallel-request limiter - PR #25994
    • Persist default router end-budget across restarts - PR #25991
    • Align reset times for legacy entities (Team Members, End Users) with the standardized calendar - PR #25440
    • Batch-limit stale managed-object cleanup to prevent 300K-row UPDATE - PR #25227
    • Cache invalidation: stop double-hashing token in bulk update and key rotation - PR #25552
    • model_max_budget silently broken for routed models - PR #25549
    • Expose reasoning-effort fields in get_model_info (and add together_ai/gpt-oss-120b to cost map) - PR #25263
    • Veo 3.1 Lite resolution-aware tiered cost tracking - PR #25348
    • Add us-south1 region for Vertex qwen3-235b-a22b-instruct-2507-maas cost map - PR #25382

    MCP Gateway

    • Validate is_tool_name_prefixed against the set of known MCP server prefixes - PR #25085
    • Restore PKCE-triggering 401 when no stored per-user token exists - PR #26032
    • Expose per-server InitializeResult.instructions from the MCP gateway - PR #25694
    • Extract shared PKCE helpers into utils/pkce.ts - PR #25878
    • UI: AntD Select for MCP ToolTestPanel boolean inputs - PR #25809
    • UI: persist extra_headers on MCP server edit - PR #26003

    Performance / Loadbalancing / Reliability improvements

    • Prometheus exporter performance improvements - PR #25934
    • Optimize DB query to prevent OOM during health checks - PR #25732
    • PodLockManager.release_lock atomic compare-and-delete (re-land of #21226) - PR #24466
    • Health-check reasoning-token max-token precedence - PR #25936
    • New BACKGROUND_HEALTH_CHECK_MAX_TOKENS environment variable - PR #25344
    • Return None for routing_strategy_args when strategy is not latency-based - PR #25882
    • Bump proxy dependencies; raise minimum Python to 3.10 - PR #26022
    • Bump 22 of 25 vulnerable dependabot-reported dependencies - PR #25442
    • Migrate packaging, CI, and Docker from Poetry to uv - PR #25007
    • [Infra] Bump llm_translation_testing resource class to xlarge and tolerate worker restarts - PR #25887, PR #25898
    • [Infra] Expand CI branch filters for non-main PR targets - PR #25819
    • [Infra] Guard main to only accept PRs from staging and hotfix branches - PR #25733
    • [Infra] Remove unused publish_proxy_extras and prisma_schema_sync jobs from CircleCI config - PR #25821
    • fix(ci): increase test-server-root-path timeout to 30m - PR #25741
    • Remove non-existent litellm_mcps_tests_coverage from coverage combine - PR #25737
    • Helm: add tpl support to extraContainers/extraInitContainers - PR #25494
    • Advisor tool orchestration loop for non-Anthropic providers - PR #25579

    Documentation Updates

    • Cost discrepancy debugging guide - PR #25622
    • Week 2 onboarding checklist - PR #25452
    • Add "Copy Page as Markdown" + llms.txt to docs site - PR #25975
    • Docs announcement bar for Trivy compromise resolution - PR #25870
    • Restyle docs.litellm.ai/blog to engineering blog aesthetic - PR #25580
    • Ramp-style engineering blog restyle + Redis circuit breaker post - PR #25583
    • Add back arrow to blog post pages - PR #25587
    • Fallbacks image - PR #25731
    • General docs update - PR #25736
    • Backfill release notes for v1.83.3-stable and v1.83.7.rc.1 - PR #25723, PR #25726
    • Fix version shown in docs - PR #25875

    New Contributors

    • @hunterchris made their first contribution in https://github.com/BerriAI/litellm/pull/20261
    • @Dmitry-Kucher made their first contribution in https://github.com/BerriAI/litellm/pull/24998
    • @kulia26 made their first contribution in https://github.com/BerriAI/litellm/pull/25071
    • @jaxhend made their first contribution in https://github.com/BerriAI/litellm/pull/23532
    • @abhyudayareddy made their first contribution in https://github.com/BerriAI/litellm/pull/25337
    • @avarga1 made their first contribution in https://github.com/BerriAI/litellm/pull/25263
    • @acebot712 made their first contribution in https://github.com/BerriAI/litellm/pull/24268
    • @meutsabdahal made their first contribution in https://github.com/BerriAI/litellm/pull/25395
    • @shreyescodes made their first contribution in https://github.com/BerriAI/litellm/pull/25559
    • @Lucas-Song-Dev made their first contribution in https://github.com/BerriAI/litellm/pull/25324
    • @steromano87 made their first contribution in https://github.com/BerriAI/litellm/pull/25915
    • @jlav made their first contribution in https://github.com/BerriAI/litellm/pull/25494

    Full Changelog: https://github.com/BerriAI/litellm/compare/v1.83.7-stable...v1.83.10-stable

    Original source
  • Mar 16, 2026
    • Date parsed from source:
      Mar 16, 2026
    • First seen by Releasebot:
      May 23, 2026
    liteLLM logo

    liteLLM

    v1.82.3 - Nebius AI, gpt-5.4, Gemini 3.x, FLUX Kontext, and 116 New Models

    liteLLM releases major model and provider expansion with Nebius AI, SageMaker Nova, and Black Forest Labs support, plus day-0 OpenAI gpt-5.4 routing, Gemini 3.x updates, WebSocket streaming for Responses API, stronger RBAC, Vault integration, and secret redaction.

    Key Highlights

    • Nebius AI — new provider — 30 models across DeepSeek, Qwen, Llama, Mistral, NVIDIA, and BAAI available via Nebius AI cloud - PR #22614
    • OpenAI gpt-5.4 / gpt-5.4-pro — day 0 — Full pricing and routing support for gpt-5.4 (1M context, $2.50/$15.00) and gpt-5.4-pro ($30.00/$180.00) on OpenAI and Azure
    • Gemini 3.x models — gemini-3-flash-preview, gemini-3.1-pro-preview, gemini-3.1-flash-image-preview, and gemini-embedding-2-preview added to cost map for Google AI and Vertex AI
    • FLUX Kontext image editing — flux-kontext-pro and flux-kontext-max added to Black Forest Labs, alongside flux-pro-1.0-fill and flux-pro-1.0-expand for inpainting and outpainting
    • 116 new models, 132 deprecated models cleaned up — Major model map refresh including Mistral Magistral, Dashscope Qwen3 VL, xAI Grok via Azure AI, ZAI GLM-5, Serper Search; removal of OpenAI GPT-3.5/GPT-4 legacy variants, Gemini 1.5, and Vertex AI PaLM2
    • SageMaker Nova provider — New sagemaker_nova provider for Amazon Nova models on SageMaker - PR #21542
    • Hashicorp Vault secret manager — Config override backend powered by Hashicorp Vault, with full UI for managing vault-sourced credentials - PR #22939, PR #23036
    • Responses API WebSocket streaming — Real-time WebSocket streaming for the Responses API, including support across all providers - PR #22559, PR #22771
    • Org Admin RBAC expansion — Org Admins can now access team management endpoints, view and invite internal users, and manage team membership without requiring a global admin role - PR #23085, PR #23080
    • Guardrail mode defaults and tag-based modes — Set a default guardrail mode list globally, and specify a list of modes in tag-based guardrail configs - PR #22676, PR #23020
    • Secret redaction in logs — API keys, tokens, and credentials automatically scrubbed from all proxy log output. Enabled by default; opt out with LITELLM_DISABLE_REDACT_SECRETS=true - PR #23668
    • Streaming stability fix — Critical fix for RuntimeError: Cannot send a request, as the client has been closed. crashes after ~1 hour in production - PR #22926

    New Providers and Endpoints

    New Providers (7 new providers)

    Provider Supported LiteLLM Endpoints Description

    Nebius AI (nebius/) /chat/completions, /embeddings EU-based AI cloud with 30+ open models — DeepSeek, Qwen3, Llama 3.1/3.3, NVIDIA Nemotron, BAAI embeddings
    ZAI (zai/) /chat/completions ZhipuAI GLM-5 models via ZAI cloud
    Black Forest Labs (black_forest_labs/) /images/generations, /images/edits FLUX image generation and editing — Kontext Pro/Max, Pro 1.0 Fill/Expand
    Serper (serper/) /search Web search via Serper API
    SageMaker Nova (sagemaker_nova/) /chat/completions Amazon Nova models via SageMaker endpoint
    Google Search API (google_search/) /search Google Search API integration - PR #22752
    Bedrock Mantle (bedrock_mantle/) /chat/completions Amazon Bedrock via Mantle — alternative auth and routing path for Bedrock models - PR #22866

    New Models / Updated Models

    New Model Support (116 new models)

    Includes OpenAI gpt-5.4 and gpt-5.4-pro, Google Gemini 3.x variants, Mistral Magistral models, Dashscope Qwen3 VL models, Black Forest Labs FLUX Kontext image editing models, Azure AI xAI Grok models, and many more.

    Updated Models

    AWS Bedrock: Added prompt caching cost estimation for Anthropic models; renamed regional identifiers.
    Azure OpenAI: Added supports_none_reasoning_effort to gpt-5.1-chat, gpt-5.1-codex, and gpt-5.4 variants; removed deprecated models azure/gpt-35-turbo-0301 and azure/gpt-35-turbo-0613.

    Features

    OpenAI: Day 0 support for gpt-5.4 and gpt-5.4-pro on OpenAI and Azure.
    Google Gemini: Added Gemini 3.x model cost map entries and re-added Gemini 2.0 Flash and Flash Lite with updated pricing.
    Google Vertex AI: Added Gemini 3.x models to cost map.
    Mistral: Added Magistral reasoning models and other variants.
    Dashscope / Qwen: Added Qwen3 VL multimodal models and other Qwen3 variants.
    Black Forest Labs: Added FLUX Kontext image editing models and FLUX Pro 1.0 Fill/Expand.
    Azure AI: Added xAI Grok models and Mistral Document AI OCR mode.
    AWS Bedrock: Added new models via Bedrock Converse.
    SageMaker: Added sagemaker_nova provider for Amazon Nova models on SageMaker.

    Bugs

    Fixed various issues including streaming finish_reason for tool calls, JSON schema preservation for Gemini 2.0+, handling of reasoning_effort param, content truncation in streaming, and more.

    AI Integrations

    Added Gemini and Vertex AI support to HeliconeLogger, fixed provider URLs, Langfuse failure path fixes, Vantage integration for FOCUS 1.2 CSV export, and general fixes.

    Guardrails

    Configured default guardrail mode lists globally and tag-based guardrail mode lists; fixed presidio PII token leak and OTEL orphaned guardrail traces.

    Secret Managers

    Full Hashicorp Vault integration as a config override backend with UI support for managing vault-sourced credentials.

    MCP Gateway

    Added token authentication for MCP servers, team-scoped MCP server filtering, and per-server health recheck in UI.

    Spend Tracking, Budgets and Rate Limiting

    Fixed budget-linked keys never having spend reset, added flex pricing support, fixed spend log cleanup and deduplication, and fixed TypeError when request has no API key.

    Performance / Loadbalancing / Reliability improvements

    Fixed streaming crashes after ~1 hour, OOM / Prisma connection loss on large installs, centralized logging kwarg updates, fixed tiktoken cache for non-root offline containers, and other reliability improvements.

    Security

    Added secret redaction in proxy logs, bumped PyJWT to ^2.12.0, and updated tar and tornado to address CVEs.

    Database / Proxy Operations

    Fixed Prisma migrate deploy on pre-existing instances and made DB migration failure exit opt-in.

    Documentation Updates

    Added Anthropic /v1/messages → /responses parameter mapping reference, updated Okta SSO docs, added environment variables reference, and added Gemini Vertex AI PayGo/priority cost tracking docs.

    New Contributors

    Several new contributors made their first contributions in this release.

    Diff Summary

    New Providers: 7
    New Models / Updated Models: 116 new, 132 removed
    LLM API Endpoints: 37
    Management Endpoints / UI: 31
    AI Integrations: 8
    MCP Gateway: 5
    Spend Tracking, Budgets and Rate Limiting: 5
    Performance / Loadbalancing / Reliability improvements: 9
    Security: 3
    Database / Proxy Operations: 2
    Documentation Updates: 5

    Original source
  • Feb 28, 2026
    • Date parsed from source:
      Feb 28, 2026
    • First seen by Releasebot:
      May 23, 2026
    liteLLM logo

    liteLLM

    v1.82.0 - Realtime Guardrails, Projects Management, and 10+ Performance Optimizations

    liteLLM adds realtime API guardrails, a new Projects management UI, expanded guardrail policies, and day 0 support for OpenAI gpt-5.3-codex. It also routes /v1/messages to the Responses API by default, improves performance, and broadens model coverage across providers.

    Key Highlights

    • Realtime API guardrails — Full guardrails support for /v1/realtime WebSocket sessions with pre/post-call enforcement, voice transcription hooks, session termination policies, and Vertex AI Gemini Live support - PR #22152, PR #22153, PR #22161, PR #22165
    • Projects Management — New Projects UI with full CRUD, project-scoped virtual keys, and admin opt-in toggle — organize teams and keys by project - PR #22315, PR #22360, PR #22373, PR #22412
    • Guardrail ecosystem expansion — Noma v2, Lakera v2 post-call, Singapore regulatory policies (PDPA + MAS), employment discrimination blockers, code execution blocker, guardrail policy versioning, and production monitoring - PR #21400, PR #21783, PR #21948
    • OpenAI Codex 5.3 — day 0 — Full support for gpt-5.3-codex on OpenAI and Azure, plus gpt-audio-1.5 and gpt-realtime-1.5 model coverage - PR #22035
    • 10+ performance optimizations — Streaming hot-path fixes, Redis pipeline batching, database task batching, ModelResponse init skip, and router cache improvements — lower latency and CPU on every request
    • /v1/messages→/responses routing — /v1/messages requests are now routed to the Responses API by default for OpenAI/Azure models

    This version starts routing /v1/messages requests to the /responses API by default. To opt out and continue using chat/completions, set LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true or litellm_settings.use_chat_completions_url_for_anthropic_messages: true in your config.

    New Models / Updated Models

    New Model Support (20 new models)

    Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features
    OpenAI | gpt-5.3-codex | 272K | $1.75 | $14.00 | Reasoning, coding
    Azure OpenAI | azure/gpt-5.3-codex | 272K | $1.75 | $14.00 | Azure deployment
    OpenAI | gpt-audio-1.5 | 128K | $2.50 | $10.00 | Audio model
    Azure OpenAI | azure/gpt-audio-1.5-2026-02-23 | 128K | $2.50 | $10.00 | Audio model
    OpenAI | gpt-realtime-1.5 | 32K | $4.00 | $16.00 | Realtime model
    Azure OpenAI | azure/gpt-realtime-1.5-2026-02-23 | 32K | $4.00 | $16.00 | Realtime model
    Groq | groq/openai/gpt-oss-safeguard-20b | 131K | $0.075 | $0.30 | Guardrail inference
    Google Vertex AI | vertex_ai/gemini-3.1-flash-image-preview | - | - | - | Image generation
    Perplexity | perplexity/perplexity/sonar | - | - | - | Sonar search
    Perplexity | perplexity/openai/gpt-5.1 | - | - | - | Hosted routing
    Perplexity | perplexity/openai/gpt-5-mini | - | - | - | Hosted routing
    Perplexity | perplexity/google/gemini-2.5-flash | - | - | - | Hosted routing
    Perplexity | perplexity/google/gemini-2.5-pro | - | - | - | Hosted routing
    Perplexity | perplexity/google/gemini-3-flash-preview | - | - | - | Hosted routing
    Perplexity | perplexity/google/gemini-3-pro-preview | - | - | - | Hosted routing
    Perplexity | perplexity/anthropic/claude-haiku-4-5 | - | - | - | Hosted routing
    Perplexity | perplexity/anthropic/claude-sonnet-4-5 | - | - | - | Hosted routing
    Perplexity | perplexity/anthropic/claude-opus-4-5 | - | - | - | Hosted routing
    Perplexity | perplexity/anthropic/claude-opus-4-6 | - | - | - | Hosted routing
    Perplexity | perplexity/xai/grok-4-1-fast-non-reasoning | - | - | - | Hosted routing

    Features

    OpenAI

    • Day 0 support for gpt-5.3-codex on OpenAI and Azure - PR #22035
    • Add gpt-audio-1.5 model cost map - PR #22303
    • Add gpt-realtime-1.5 model cost map - PR #22304
    • Add audio as supported OpenAI param - PR #22092
    • Add prompt_cache_key and prompt_cache_retention support - PR #20397

    Azure OpenAI

    • New Azure OpenAI models 2026-02-25 - PR #22114

    Anthropic

    • Add v1 Anthropic Responses API transformation - PR #22087
    • Sanitize tool_use IDs in convert_to_anthropic_tool_invoke - PR #21964
    • Fix model wildcard access issue - PR #21917

    AWS Bedrock

    • Encode model ARNs for OpenAI-compatible Bedrock imported models - PR #21701
    • Support optional regional STS endpoint in role assumption - PR #21640
    • Native structured outputs API support - PR #21222

    Google Vertex AI

    • Add gemini-3.1-flash-image-preview to model cost map - PR #22223
    • Enable context-1m-2025-08-07 beta header for Vertex AI provider - PR #21867

    OpenRouter

    • Add OpenRouter native models to model cost map - PR #20520
    • Add OpenRouter Opus 4.6 to model map - PR #20525

    Mistral

    • Adjust mistral-small-2503 input/output cost per token - PR #22097

    Groq

    • Add groq/openai/gpt-oss-safeguard-20b model pricing - PR #21951

    AI/ML

    • Update AIML model pricing - PR #22139

    Ollama

    • Thread api_base to get_model_info + graceful fallback - PR #21970

    PublicAI

    • Fix function calling for PublicAI Apertus models - PR #21582

    xAI

    • Add deprecation dates for grok-2-vision-1212 and grok-3-mini models - PR #20102

    General

    • Forward auth headers of provider - PR #22070
    • Normalize camelCase thinking param keys to snake_case - PR #21762
    • Allow dimensions param passthrough for non-text-embedding-3 OpenAI models - PR #22144

    Bug Fixes

    AWS Bedrock

    • Fix converse handling for parallel_tool_calls - PR #22267
    • Restore parallel_tool_calls mapping in map_openai_params - PR #22333
    • Correct modelInput format for Converse API batch models - PR #21656
    • Prevent double UUID in create_file S3 key - PR #21650
    • Filter internal json_tool_call when mixed with real tools - PR #21107
    • Pass timeout param to Bedrock rerank HTTP client - PR #22021

    Anthropic

    • Fix model cost map for anthropic fast and inference_geo - PR #21904

    Image Generation

    • Propagate extra_headers to upstream image generation - PR #22026
    • Add ChatCompletionImageObject in OpenAIChatCompletionAssistantMessage - PR #22155

    General

    • Preserve forwarding of server-side called tools - PR #22260
    • Fix free model handling from UI paths - PR #22258
    • Fix None TypeError in mapping - PR #22080

    LLM API Endpoints

    Features

    Realtime API

    • Guardrails support for /v1/realtime WebSocket endpoint - PR #22152
    • Vertex AI Gemini Live via unified /realtime endpoint - PR #22153
    • Guardrails with pre_call/post_call mode on realtime WebSocket - PR #22161
    • end_session_after_n_fails + Endpoint Settings wizard step - PR #22165
    • Guardrail hook for voice transcription - PR #21976
    • Fix guardrails not firing for Gemini/Vertex AI and provider_config realtime sessions - PR #22168
    • Add logging, spend tracking support + tool tracing - PR #22105

    Video Generation

    • Add variant parameter to video content download - PR #21955
    • Pass api_key from litellm_params to video remix handlers - PR #21965
    • Apply custom video pricing from deployment model_info - PR #21923
    • Fix passing of image and parameters in videos API - PR #22170

    OCR

    • Enable local file support for OCR - PR #22133

    Websearch / Tool Calling

    • Preserve thinking blocks in agentic loop follow-up messages - PR #21604

    General

    • Add configurable upper bound for chunk processing time - PR #22209
    • Emit x-litellm-overhead-duration-ms header for streaming requests - PR #22027

    Bugs

    General

    • Fix mypy attr-defined errors on realtime websocket calls - PR #22202

    Management Endpoints / UI

    Features

    Projects
    • Add Projects page with list and create flows - PR #22315
    • Add Project Details page with edit modal - PR #22360
    • Add project keys table and project dropdown on key create/edit - PR #22373
    • Add delete project action to Projects table - PR #22412
    • Add Projects Opt-In Toggle in Admin Settings - PR #22416
    • Include created_at and updated_at in /project/list response - PR #22323
    • Add tags in project - PR #22216
    Virtual Keys + Access Groups
    • Add bidirectional team/key sync for Access Group CRUD flows - PR #22253
    • Add pagination and search to /key/aliases to prevent OOMs - PR #22137
    • Add paginated key alias selector in UI - PR #22157
    • Add project_id and access_group_id filters for key list endpoint - PR #22356
    • Add KeyInfoHeader component - PR #22047
    • Restrict Edit Settings to key owners - PR #21985
    • Fix virtual key grace period from env/UI - PR #20321
    Agents
    • Assign virtual keys to agents - PR #22045
    • Assign tools to agents - PR #22064
    • Ensure internal users cannot create agents (RBAC enforcement) - PR #22329
    Proxy Auth / SSO
    • OIDC discovery URLs, roles array handling, and dot-notation error hints - PR #22336
    • Add PROXY_ADMIN role to system user for key rotation - PR #21896
    Usage / Spend Logs
    • Add user filtering to usage page - PR #22059
    • Allow using AI to understand usage patterns - PR #22042
    • Use backend request_duration_ms and make Duration sortable in Logs - PR #22122
    • Add request_duration_ms to SpendLogs - PR #22066
    • Enrich failure spend logs with key/team metadata - PR #22049
    • Show real tool names in logs for Anthropic-format tools - PR #22048
    Models + Endpoints
    • Show proxy URL in ModelHub - PR #21660
    • Add /public/endpoints for provider endpoint support - PR #22248
    UI Improvements
    • Add custom favicon support - PR #21653
    • Add Blog Dropdown in Navbar - PR #21859
    • Add UI banner warning for detailed debug mode - PR #21527
    • Make auth value optional for MCP Server create flow - PR #22119
    • Tool policies: auto-discover tools + policy enforcement guardrail - PR #22041
    Health Checks
    • Add health check max tokens configuration - PR #22299
    • Limit concurrent health checks with health_check_concurrency - PR #20584
    • Fix health check model_id filtering - PR #21071

    Bugs

    • Populate user_id and user_info for admin users in /user/info - PR #22239
    • Fix virtual keys pagination stale totals when filtering - PR #22222
    • Fix Spend Update Queue aggregation never triggers with default presets - PR #21963
    • Fix timezone config lookup and replace hardcoded timezone map with ZoneInfo - PR #21754
    • Fix custom auth budget issue - PR #22164
    • Fix missing OAuth session state - PR #21992
    • Fix Transport Type for OpenAPI Spec on UI - PR #22005
    • Fix Claude Code plugin schema - PR #22271
    • Add missing migration for LiteLLM_ClaudeCodePluginTable - PR #22335
    • Only tag selected deployment in access group creation - PR #21655
    • State management fixes for CheckBatchCost - PR #21921
    • Remove duplicate antd import in ToolPolicies - PR #22107

    AI Integrations

    Logging

    DataDog
    • Add ability to trace metrics in DataDog - PR #22103
    • Correlate LiteLLM call IDs with DataDog APM spans - PR #22219
    • Fix TTS metric emission issues - PR #20632
    Prometheus
    • Add opt-in stream label on litellm_proxy_total_requests_metric - PR #22023
    • Fix team +Inf budgets in Prometheus metrics - PR #22243
    Langfuse
    • Fix Langfuse OTEL trace issues - PR #21309
    Arize Phoenix
    • Fix nested traces coexistence with OTEL callback - PR #22169
    Slack
    • Add optional digest mode for Slack alert types - PR #21683
    General
    • Fix Gemini trace ID missing in logging - PR #22077
    • Populate cache_read_input_tokens from prompt_tokens_details for OpenAI/Azure - PR #22090

    Guardrails

    Noma

    • Noma guardrails v2 based on custom guardrails framework - PR #21400

    LakeraAI

    • Add Lakera v2 post-call hook with fixed PII masking - PR #21783

    Presidio

    • Fix Presidio streaming and false positives - PR #21949
    • Fix Presidio streaming v3 reliability improvements - PR #22283
    • Prevent Presidio crash on non-JSON responses - PR #22084

    Built-in Guardrails

    • Block code execution guardrail to prevent agents from executing code - PR #22154
    • Employment discrimination topic blockers for 5 protected classes - PR #21962
    • Claims agent guardrails (5 categories + policy template) - PR #22113
    • New code execution evaluation dataset - PR #22065
    • Tool policies: auto-discover tools + policy enforcement - PR #22041

    Policy Templates

    • Singapore guardrail policies (PDPA + MAS AI Risk Management) - PR #21948
    • Prefix SG guardrail policy IDs with country code - PR #21974
    • Guardrail policy versioning - PR #21862

    Guardrail Monitoring

    • Guardrail Monitor — measure guardrail reliability in production - PR #21944

    Security

    • Fix unauthenticated RCE and sandbox escape in custom code guardrail - PR #22095

    Prompt Management

    No major prompt management changes in this release.

    Secret Managers

    No major secret manager changes in this release.

    Spend Tracking, Budgets and Rate Limiting

    • Priority PayGo cost tracking for Gemini/Vertex AI - PR #21909
    • Add request_duration_ms to SpendLogs for latency tracking per request - PR #22066
    • Add in_flight_requests metric to /health/backlog + Prometheus - PR #22319
    • Enrich failure spend logs with key/team metadata - PR #22049
    • Add spend tracking lifecycle logging for debugging spend flows - PR #22029
    • Fix budget timezone config lookup and replace hardcoded timezone map with ZoneInfo - PR #21754
    • Fix Spend Update Queue aggregation never triggering with default presets - PR #21963
    • Avoid mutating caller-owned dicts in SpendUpdateQueue aggregation - PR #21742
    • Optimize old spendlog deletion cron job - PR #21930
    • Health check max tokens configuration - PR #22299

    MCP Gateway

    • Pass MCP auth headers from request context to tool fetch for /v1/responses and /chat/completions - PR #22291
    • Default available_on_public_internet to true for MCP server behavior consistency - PR #22331
    • Clear error messages for IP filtering / no available tools - PR #22142
    • Strip stale mcp-session-id header to prevent 400 errors across proxy workers - PR #21417
    • Skip health check for MCP with passthrough token auth - PR #21982
    • Fix missing OAuth session state - PR #21992
    • Fix Transport Type for OpenAPI Spec on UI - PR #22005
    • Add e2e test for stateless StreamableHTTP behavior - PR #22033

    Performance / Loadbalancing / Reliability improvements

    Streaming & hot-path

    • Streaming latency improvements — 4 targeted hot-path fixes - PR #22346
    • Skip throwaway Usage() construction in ModelResponse.init - PR #21611
    • Optimize is_model_o_series_model with startswith - PR #21690
    • Use cached _safe_get_request_headers instead of per-request construction - PR #21430
    • Emit x-litellm-overhead-duration-ms header for streaming requests - PR #22027

    Database & Redis

    • Batch 11 create_task() calls into 1 in update_database() - PR #22028
    • Redis pipeline spend updates for batched writes - PR #22044
    • Recover from prisma-query-engine zombie process - PR #21899
    • Optimize old spendlog deletion cron job - PR #21930

    Router & caching

    • Add cache invalidation for _cached_get_model_group_info - PR #20376
    • Remove cache eviction close that kills in-use httpx clients - PR #22247
    • Store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings - PR #22143
    • Fix ensure_arrival_time set before calculating queue time - PR #21918

    Connection management

    • Only set enable_cleanup_closed on aiohttp when required - PR #21897
    • Prometheus child_exit cleanup for gunicorn workers - PR #22324
    • Prometheus multiprocess cleanup - PR #22221
    • Limit concurrent health checks with health_check_concurrency - PR #20584
    • Isolate get_config failures from model sync loop - PR #22224

    Other

    • Semantic cache: support configurable vector dimensions - PR #21649
    • Honor MAX_STRING_LENGTH_PROMPT_IN_DB from config env vars - PR #22106
    • Enhance MidStreamFallbackError to preserve original status code and attributes - PR #22225
    • Network mock utility for testing - PR #21942
    • Add missing return type annotations to iterator protocol methods in streaming_handler - PR #21750

    Security

    • Fix critical/high CVEs in OS-level libs and NPM transitive dependencies - PR #22008
    • Fix unauthenticated RCE and sandbox escape in custom code guardrail - PR #22095
    • Remove hardcoded base64 string flagged by secret scanner - PR #22125

    Documentation Updates

    • Add OpenAI Agents SDK tutorial with LiteLLM Proxy - PR #21221
    • Add OpenClaw integration tutorial - PR #21605
    • Add Google GenAI SDK tutorial (JS & Python) - PR #21885
    • Add Gollem Go agent framework cookbook example - PR #21747
    • Update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway - PR #21130
    • Add store_model_in_db release docs - PR #21863
    • Add Credential Usage Tracking docs - PR #22112
    • Add proxy request tags docs - PR #22129
    • Add trailing slash to /mcp endpoint URLs - PR #20509
    • Add pre-PR checklist to UI contributing guide - PR #21886
    • Replace Azure OpenAI key with mock key in docs - PR #21997
    • Add performance & reliability section to v1.81.14 release notes - PR #21950
    • Update v1.81.12-stable release notes to point to stable.1 - PR #22036
    • Add security vulnerability scan report to v1.81.14 release notes - PR #22385

    New Contributors

    • @janfrederickk made their first contribution in PR #21660
    • @hztBUAA made their first contribution in PR #21656
    • @LeeJuOh made their first contribution in PR #21754
    • @WhoisMonesh made their first contribution in PR #21750
    • @trevorprater made their first contribution in PR #21747
    • @edwiniac made their first contribution in PR #21870
    • @stakeswky made their first contribution in PR #21867
    • @ta-stripe made their first contribution in PR #21701
    • @ron-zhong made their first contribution in PR #21948
    • @Arindam200 made their first contribution in PR #21221
    • @Canvinus made their first contribution in PR #21964
    • @nicolopignatelli made their first contribution in PR #21951
    • @MarshHawk made their first contribution in PR #20584
    • @gavksingh made their first contribution in PR #22106
    • @roni-frantchi made their first contribution in PR #22090
    • @noahnistler made their first contribution in PR #22133
    • @dylan-duan-aai made their first contribution in PR #21130
    • @rasmi made their first contribution in PR #22322

    Diff Summary

    02/28/2026

    • New Models / Updated Models: 26
    • LLM API Endpoints: 14
    • Management Endpoints / UI: 38
    • AI Integrations: 25
    • Spend Tracking, Budgets and Rate Limiting: 10
    • MCP Gateway: 8
    • Performance / Loadbalancing / Reliability improvements: 22
    • Security: 3
    • Documentation Updates: 14
    Original source
  • May 2026
    • No date parsed from source.
    • First seen by Releasebot:
      May 23, 2026
    liteLLM logo

    liteLLM

    v1.81.14 - New Gateway Level Guardrails & Compliance Playground

    liteLLM adds Guardrail Garden, Compliance Playground, and three new zero-cost built-in guardrails, while also shipping Admin UI model storage settings, day one Claude Sonnet 4.6 support, new API endpoints, and broad performance and reliability improvements.

    Key Highlights

    • Guardrail Garden — Browse built-in and partner guardrails by use case — competitor blocking, topic filtering, GDPR, prompt injection, and more. Pick a template, customize it, attach it to a team or key.
    • Compliance Playground — Test any guardrail policy against your own traffic before it goes live. See precision, recall, and false positive rate — so you know how it'll behave in production.
    • 3 new zero-cost built-in guardrails — Competitor name blocker, topic blocker, and insults filter — all gateway-level, <0.1ms latency, no external API, configurable per-team or key
    • Store Model in DB Settings via UI - Configure model storage directly in the Admin UI without editing config files or restarting the proxy—perfect for cloud deployments
    • Claude Sonnet 4.6 — day 0 — Full support across Anthropic and Vertex AI: reasoning, computer use, prompt caching, 200K context
    • 20+ performance optimizations — Faster routing, lower logging overhead, reduced cost-calculator latency, and connection pool fixes — meaningfully less CPU and latency on every request

    Guardrail Garden

    AI Platform Admins can now browse built-in and partner guardrails from the Guardrail Garden. Guardrails are organized by use case — blocking financial advice, filtering insults, detecting competitor mentions, and more — so you can find the right one and deploy it in a few clicks.

    3 New Built-in Guardrails

    This release brings 3 new built-in guardrails that run directly on the gateway. This is great for AI Gateway Admins who need low latency, zero cost guardrails for their scenarios.

    • Denied Financial Advice — detects requests for personalized financial advice, investment recommendations, or financial planning
    • Denied Insults — detects insults, name-calling, and personal attacks directed at the chatbot, staff, or other people
    • Competitor Name Blocker — detects mentions of competitor brands in responses

    These guardrails are built for production and on our benchmarks had a 100% Recall and Precision.

    Store Model in DB Settings via UI

    Previously, the store_model_in_db setting could only be configured in proxy_config.yaml under general_settings, requiring a proxy restart to take effect. Now you can enable or disable this setting directly from the Admin UI without any restarts. This is especially useful for cloud deployments where you don't have direct access to config files or want to avoid downtime. Enable store_model_in_db to move model definitions from your YAML into the database—reducing config complexity, improving scalability, and enabling dynamic model management across multiple proxy instances.

    Eval results

    We benchmarked our new built-in guardrails against labeled datasets before shipping. You can see the results for Denied Financial Advice (207 cases) and Denied Insults (299 cases):

    100% precision means zero false positives — no legitimate messages were incorrectly blocked. 100% recall means zero false negatives — every message that should have been blocked was caught.

    Compliance Playground

    The Compliance Playground lets you test any guardrail against our pre-built eval datasets or your own custom datasets, so you can see precision, recall, and false positive rate before rolling it out to production.

    Performance & Reliability — Up to 13% Lower Latency

    This release cuts latency across all percentiles through 20+ micro-optimizations across logging, cost calculation, routing, and connection management. See benchmarking for more info about how to benchmark yourself.

    • Mean latency: 78.4 ms → 70.3 ms (−10.3%)
    • p50 latency: 64.8 ms → 57.3 ms (−11.7%)
    • p99 latency: 288.9 ms → 250.0 ms (−13.4%)

    Streaming Connection Pool Fix

    Fixed a 3-fold connection leak that caused TCP connection starvation under streaming workloads: the aiohttp transport wasn't closing connections, no finally blocks were calling close on disconnect, and a Uvicorn bug prevented disconnect signaling.

    Redis Connection Pool Reliability

    Fixed 4 separate connection pool bugs to make how we use Redis more reliable. The most important change was on pools being leaked on cache expiry and the other fixes are detailed here in PR #21717.

    New Providers and Endpoints

    New Providers (1 new provider):

    • IBM watsonx.ai — Rerank support for IBM watsonx.ai models

    New LLM API Endpoints (1 new endpoint):

    • /v1/evals (POST/GET) — OpenAI-compatible Evals API for model evaluation

    New Models / Updated Models

    New Model Support (13 new models) including:

    • Anthropic claude-sonnet-4-6 with 200K context, reasoning, computer use, prompt caching, vision, PDF
    • Vertex AI vertex_ai/claude-opus-4-6@default with 1M context
    • Google Gemini gemini-3.1-pro-preview with audio, video, images, PDF
    • GitHub Copilot github_copilot/gpt-5.3-codex and github_copilot/claude-opus-4.6-fast
    • Mistral devstral-small-latest, devstral-latest, devstral-medium-latest
    • OpenRouter openrouter/minimax/minimax-m2.5
    • Fireworks AI models glm-4p7, minimax-m2p1, kimi-k2p5

    Features

    Includes day 0 support for Claude Sonnet 4.6, native structured outputs API support for AWS Bedrock, day 0 support for Google Gemini 3.1 pro preview, Databricks support, GitHub Copilot model additions, Mistral model aliases, IBM watsonx.ai rerank support, xAI usage fixes, Dashscope request formatting fixes, hosted_vllm multi-turn conversation improvements, OCI/Oracle Grok output pricing fix, AU Anthropic model ID fix, general routing and parameter improvements.

    Bug Fixes

    Fixes across AWS Bedrock, Bedrock Converse, Fireworks AI model pricing, Responses API reasoning parameter, metadata preservation for custom callbacks, spend logs cost calculation, logs pagination, UI logo caching, duplicate URL in tagsSpendLogsCall, key alias and team ID metadata preservation, response_model endpoint, internal user viewer access, warning suppression for litellm-dashboard team.

    LLM API Endpoints

    Features include Responses API improvements, Evals API OpenAI compatibility, Batch API file deletion criteria, Pass-Through Endpoints method-based routing, OAuth Authorization header forwarding, Websearch tool additions and fixes, general parameter and reasoning support.

    Management Endpoints / UI

    Features include Access Group Selector, Virtual Keys fixes, Key Last Active Tracking, Model Settings Modal, store_model_in_db database setting, input cost masking fixes, credentials resolution, team usage visibility, service account visibility, organization info UI improvements.

    AI Integrations

    Logging improvements with DataDog team tags, Prometheus metrics fixes and middleware, Langfuse test isolation, general logging cost fixes, streaming proxy throughput improvements.

    Guardrails

    Launch of Guardrail Garden marketplace, redesigned guardrail creation UI, guardrail jump links, guardrail tracing UI, AI Policy Templates with seven new ready-to-deploy policies including GDPR, EU AI Act, prompt injection detection, topic filters, airline off-topic restriction, SQL injection, AI-powered policy template suggestions.

    Compliance Checker

    Added compliance checker endpoints and UI panel, CSV dataset upload for batch testing.

    Built-in Guardrails

    Competitor name blocker, topic blocker, insults content filter, MCP Security guardrail.

    Generic Guardrails

    Configurable fallback for generic guardrail endpoint failures.

    Presidio

    Fixes to controls configuration.

    LakeraAI

    Avoid KeyError on missing LAKERA_API_KEY during initialization.

    Auto Routing

    Complexity-based auto routing scoring requests across 7 dimensions to route to appropriate model tier without embeddings or API calls.

    Prompt Management

    New API for prompt management integrations, prompt registry configuration fixes.

    Spend Tracking, Budgets and Rate Limiting

    Fixes for Bedrock service_tier cost propagation, cached response cost logging, aggregate daily activity endpoint performance, key alias and team ID metadata preservation, credential name tag injection.

    MCP Gateway

    OpenAPI-to-MCP conversion, MCP user permissions, MCP security guardrail, StreamableHTTPSessionManager fix, Bedrock AgentCore Accept header fix.

    Performance / Loadbalancing / Reliability improvements

    Logging and callback overhead optimizations, cost calculation optimizations, router and load balancing improvements, connection management and reliability fixes.

    Database Changes

    Added project_id column to LiteLLM_DeletedVerificationToken, new LiteLLM_ProjectTable for project management, last_active timestamp to LiteLLM_VerificationToken, vector store migration idempotency.

    Security

    Security scans with Grype and Trivy on Docker images, vulnerability report summary, critical and high severity vulnerabilities mostly in build-time dependencies, recommendations for best security posture.

    Documentation Updates

    Added OpenAI Agents SDK guide, Access Groups documentation, Anthropic beta headers docs, latency troubleshooting, rollback safety check, incident reports, stable mark for v1.81.12.

    New Contributors

    List of contributors making first contributions in this release.

    Full Changelog

    Link to full changelog from v1.81.12.rc.1 to v1.81.14.rc.1.

    Original source
  • Jan 1, 2026
    • Date parsed from source:
      Jan 1, 2026
    • First seen by Releasebot:
      May 23, 2026
    liteLLM logo

    liteLLM

    v1.81.12-stable.1 - Guardrail Policy Templates & Action Builder

    liteLLM ships a broad release with new guardrail policy templates, a visual action builder, MCP OAuth2 and tracing, Responses API shell and context management support, Access Groups, and 50+ new Bedrock regional model entries, plus reliability fixes and UI improvements across the platform.

    Key Highlights

    • Policy Templates - Pre-configured guardrail policy templates for common safety and compliance use-cases (including NSFW, toxic content, and child safety)
    • Guardrail Action Builder - Build and customize guardrail policy flows with the new action-builder UI and conditional execution support
    • MCP OAuth2 M2M + Tracing - Add machine-to-machine OAuth2 support for MCP servers and OpenTelemetry tracing for MCP calls through AI Gateway
    • Responses API shell Tool & context_management support - Server-side context management (compaction) and Shell tool support for the OpenAI Responses API
    • Access Groups - Create access groups to manage model, MCP server, and agent access across teams and keys
    • 50+ New Bedrock Regional Model Entries - DeepSeek V3.2, MiniMax M2.1, Kimi K2.5, Qwen3 Coder Next, and NVIDIA Nemotron Nano across multiple regions
    • Add Semgrep & fix OOMs - Static analysis rules and out-of-memory fixes
    • PR #20912

    Add Semgrep & fix OOMs

    This release fixes out-of-memory (OOM) risks from unbounded asyncio.Queue() usage. Log queues (e.g. GCS bucket) and DB spend-update queues were previously unbounded and could grow without limit under load. They now use a configurable max size (LITELLM_ASYNCIO_QUEUE_MAXSIZE, default 1000); when full, queues flush immediately to make room instead of growing memory. A Semgrep rule (.semgrep/rules/python/unbounded-memory.yml) was added to flag similar unbounded-memory patterns in future code.

    Guardrail Action Builder

    This release adds a visual action builder for guardrail policies with conditional execution support. You can now chain guardrails into multi-step pipelines — if a simple guardrail fails, route to an advanced one instead of immediately blocking. Each step has configurable ON PASS and ON FAIL actions (Next Step, Block, or Allow), and you can test the full pipeline with a sample message before saving.

    Access Groups

    Access Groups simplify defining resource access across your organization. One group can grant access to models, MCP servers, and agents—simply attach it to a key or team. Create groups in the Admin UI, define which resources each group includes, then assign the group when creating keys or teams. Updates to a group apply automatically to all attached keys and teams.

    New Providers and Endpoints

    New Providers (2 new providers)

    Provider | Supported LiteLLM Endpoints | Description
    Scaleway | /chat/completions | Scaleway Generative APIs for chat completions
    Sarvam AI | /chat/completions, /audio/transcriptions, /audio/speech | Sarvam AI STT and TTS support for Indian languages

    New Models / Updated Models

    New Model Support (19 highlighted models)

    Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens)
    AWS Bedrock | deepseek.v3.2 | 164K | $0.62 | $1.85
    AWS Bedrock | minimax.minimax-m2.1 | 196K | $0.30 | $1.20
    AWS Bedrock | moonshotai.kimi-k2.5 | 262K | $0.60 | $3.00
    AWS Bedrock | moonshotai.kimi-k2-thinking | 262K | $0.73 | $3.03
    AWS Bedrock | qwen.qwen3-coder-next | 262K | $0.50 | $1.20
    AWS Bedrock | nvidia.nemotron-nano-3-30b | 262K | $0.06 | $0.24
    Azure AI | azure_ai/kimi-k2.5 | 262K | $0.60 | $3.00
    Vertex AI | vertex_ai/zai-org/glm-5-maas | 200K | $1.00 | $3.20
    MiniMax | minimax/MiniMax-M2.5 | 1M | $0.30 | $1.20
    MiniMax | minimax/MiniMax-M2.5-lightning | 1M | $0.30 | $2.40
    Dashscope | dashscope/qwen3-max | 258K | Tiered pricing | Tiered pricing
    Perplexity | perplexity/preset/pro-search | - | Per-request | Per-request
    Perplexity | perplexity/openai/gpt-4o | - | Per-request | Per-request
    Perplexity | perplexity/openai/gpt-5.2 | - | Per-request | Per-request
    Vercel AI Gateway | vercel_ai_gateway/anthropic/claude-opus-4.6 | 200K | $5.00 | $25.00
    Vercel AI Gateway | vercel_ai_gateway/anthropic/claude-sonnet-4 | 200K | $3.00 | $15.00
    Vercel AI Gateway | vercel_ai_gateway/anthropic/claude-haiku-4.5 | 200K | $1.00 | $5.00
    Sarvam AI | sarvam/sarvam-m | 8K | Free tier | Free tier
    Anthropic | fast/claude-opus-4-6 | 1M | $30.00 | $150.00

    Note: AWS Bedrock models are available across multiple regions (us-east-1, us-east-2, us-west-2, eu-central-1, eu-north-1, ap-northeast-1, ap-south-1, ap-southeast-3, sa-east-1). 54 regional model entries were added in total.

    Features

    Anthropic

    • Enable non-tool structured outputs on Claude Opus 4.5 and 4.6 using output_format param - PR #20548
    • Add support for anthropic_messages call type in prompt caching - PR #19233
    • Managing Anthropic Beta Headers with remote URL fetching - PR #20935, PR #21110
    • Remove x-anthropic-billing block - PR #20951
    • Use Authorization Bearer for OAuth tokens instead of x-api-key - PR #21039
    • Filter unsupported JSON schema constraints for structured outputs - PR #20813
    • New Claude Opus 4.6 features for /v1/messages - PR #20733
    • Fix reasoning_effort=None and "none" should return None for Opus 4.6 - PR #20800

    AWS Bedrock

    • Extend model support with 4 new beta models - PR #21035
    • Add Claude Opus 4.6 to _supports_tool_search_on_bedrock - PR #21017
    • Correct Bedrock Claude Opus 4.6 model IDs (remove :0 suffix) - PR #20564, PR #20671
    • Add output_config as supported param - PR #20748

    Vertex AI

    • Add Vertex GLM-5 model support - PR #21053
    • Propagate extra_headers anthropic-beta to request body - PR #20666
    • Preserve usageMetadata in _hidden_params - PR #20559
    • Map IMAGE_PROHIBITED_CONTENT to content_filter - PR #20524
    • Add RAG ingest for Vertex AI - PR #21120

    OCI / Cohere

    • OCI Cohere responseFormat/Pydantic support - PR #20663
    • Fix OCI Cohere system messages by populating preambleOverride - PR #20958

    Perplexity

    • Perplexity Research API support with preset search - PR #20860

    MiniMax

    • Add MiniMax-M2.5 and MiniMax-M2.5-lightning models - PR #21054

    Kimi / Moonshot

    • Add Kimi model pricing by region - PR #20855
    • Add moonshotai.kimi-k2.5 - PR #20863

    Dashscope

    • Add dashscope/qwen3-max model with tiered pricing - PR #20919

    Vercel AI Gateway

    • Add new Vercel AI Anthropic models - PR #20745

    Azure AI

    • Add azure_ai/kimi-k2.5 to Azure model DB - PR #20896
    • Support Azure AD token auth for non-Claude azure_ai models - PR #20981
    • Fix Azure batches issues - PR #21092

    DeepSeek

    • Sync DeepSeek model metadata and add bare-name fallback - PR #20938

    Gemini

    • Handle image in assistant message for Gemini - PR #20845
    • Add missing tpm/rpm for Gemini models - PR #21175

    General

    • Add 30 missing models to pricing JSON - PR #20797
    • Cleanup 39 deprecated OpenRouter models - PR #20786
    • Standardize endpoint display_name naming convention - PR #20791
    • Fix and stabilize model cost map formatting - PR #20895
    • Export PermissionDeniedError from litellm.init - PR #20960

    Bug Fixes

    Anthropic

    • Fix get_supported_anthropic_messages_params - PR #20752
    • Fix base_model name for body and deployment name in URL - PR #20747

    Azure

    • Preserve content_policy_violation error details from Azure OpenAI - PR #20883

    Vertex AI

    • Fix Gemini multi-turn tool calling message formatting (added and reverted) - PR #20569, PR #21051

    LLM API Endpoints

    Features

    Responses API
    • Add server-side context management (compaction) support - PR #21058
    • Add Shell tool support for OpenAI Responses API - PR #21063
    • Preserve tool call argument deltas when streaming id is omitted - PR #20712
    • Preserve interleaved thinking/redacted_thinking blocks during streaming - PR #20702
    Chat Completions
    • Add Web Search support using LiteLLM /search (web search interception hook) - PR #20483
    • Preserved nullable object fields by carrying schema properties - PR #19132
    • Support prompt_cache_key for OpenAI and Azure chat completions - PR #20989
    Pass-Through Endpoints
    • Add support for langchain_aws via LiteLLM passthrough - PR #20843
    • Add custom_body parameter to endpoint_func in create_pass_through_route - PR #20849
    Vector Stores
    • Add target_model_names for vector store endpoints - PR #21089
    General
    • Add output_config as supported param - PR #20748
    • Add managed error file support - PR #20838

    Bugs

    General

    • Stop leaking Python tracebacks in streaming SSE error responses - PR #20850
    • Fix video list pagination cursors not encoded with provider metadata - PR #20710
    • Handle metadata=None in SDK path retry/error logic - PR #20873
    • Fix Spend logs pickle error with Pydantic models and redaction - PR #20685
    • Remove duplicate PerplexityResponsesConfig from LLM_CONFIG_NAMES - PR #21105
    • Fix Spend Management Tests - PR #21088
    • Fix JWT email domain validation error message - PR #21212

    Management Endpoints / UI

    Features

    Access Groups
    • New Access Groups feature for managing model, MCP server, and agent access - PR #21022
    • Access Groups table and details page UI - PR #21165
    • Refactor model_ids to model_names for backwards compatibility - PR #21166
    Policies
    • Allow connecting Policies to Tags, simulating Policies, viewing key/team counts - PR #20904
    • Guardrail pipeline support for conditional sequential execution - PR #21177
    • Pipeline flow builder UI for guardrail policies - PR #21188
    SSO / Auth
    • New Login With SSO Button - PR #20908
    • M2M OAuth2 UI Flow - PR #20794
    • Allow Organization and Team Admins to call /invitation/new - PR #20987
    • Invite User: Email Integration Alert - PR #20790
    • Populate identity fields in proxy admin JWT early-return path - PR #21169
    Spend Logs
    • Show predefined error codes in filter with user definable fallback - PR #20773
    • Paginated searchable model select - PR #20892
    • Sorting columns support - PR #21143
    • Allow sorting on /spend/logs/ui - PR #20991
    UI Improvements
    • Navbar: Option to hide Usage Popup - PR #20910
    • Model Page: Improve Credentials Messaging - PR #21076
    • Fallbacks: Default configurable to 10 models - PR #21144
    • Fallback display with arrows and card structure - PR #20922
    • Team Info: Migrate to AntD Tabs + Table - PR #20785
    • AntD refactoring and 0 cost models fix - PR #20687
    • Zscaler AI Guard UI - PR #21077
    • Include Config Defined Pass Through Endpoints - PR #20898
    • Rename "HTTP" to "Streamable HTTP (Recommended)" in MCP server page - PR #21000
    • MCP server discovery UI - PR #21079
    Virtual Keys
    • Allow Management keys to access user/daily/activity and team - PR #20124
    • Skip premium check for empty metadata fields on team/key update - PR #20598
    Bugs
    • Logs: Fix Input and Output Copying - PR #20657
    • Teams: Fix Available Teams - PR #20682
    • Spend Logs: Reset Filters Resets Custom Date Range - PR #21149
    • Usage: Request Chart stack variant fix - PR #20894
    • Add Auto Router: Description Text Input Focus - PR #21004
    • Guardrail Edit: LiteLLM Content Filter Categories - PR #21002
    • Add null guard for models in API keys table - PR #20655
    • Show error details instead of 'Data Not Available' for failed requests - PR #20656
    • Fix Spend Management Tests - PR #21088
    • Fix JWT email domain validation error message - PR #21212

    AI Integrations

    Logging

    PostHog
    • Fix JSON serialization error for non-serializable objects - PR #20668
    Prometheus
    • Sanitize label values to prevent metric scrape failures - PR #20600
    Langfuse
    • Prevent empty proxy request spans from being sent to Langfuse - PR #19935
    OpenTelemetry
    • Auto-infer otlp_http exporter when endpoint is configured - PR #20438
    CloudZero
    • Update CBF field mappings per LIT-1907 - PR #20906
    General
    • Allow MAX_CALLBACKS override via env var - PR #20781
    • Add standard_logging_payload_excluded_fields config option - PR #20831
    • Enable verbose_logger when LITELLM_LOG=DEBUG - PR #20496
    • Guard against None litellm_metadata in batch logging path - PR #20832
    • Propagate model-level tags from config to SpendLogs - PR #20769

    Guardrails

    Policy Templates

    • New Policy Templates: pre-configured guardrail combinations for specific use-cases - PR #21025
    • Add NSFW policy template, toxic keywords in multiple languages, child safety content filter, JSON content viewer - PR #21205
    • Add toxic/abusive content filter guardrails - PR #20934

    Pipeline Execution

    • Add guardrail pipeline support for conditional sequential execution - PR #21177
    • Agent Guardrails on streaming output - PR #21206
    • Pipeline flow builder UI - PR #21188

    Zscaler AI Guard

    • Zscaler AI Guard bug fixes and support during post-call - PR #20801
    • Zscaler AI Guard UI - PR #21077

    ZGuard

    • Add team policy mapping for ZGuard - PR #20608

    General

    • Add logging to all unified guardrails + link to custom code guardrail templates - PR #20900
    • Forward request headers + litellm_version to generic guardrails - PR #20729
    • Empty guardrails / policies arrays should not trigger enterprise license check - PR #20567
    • Fix OpenAI moderation guardrails - PR #20718
    • Fix /v2/guardrails/list returning sensitive values - PR #20796
    • Fix guardrail status error - PR #20972
    • Reuse get_instance_fn in initialize_custom_guardrail - PR #20917

    Spend Tracking, Budgets and Rate Limiting

    • Prevent shared backend model key from being polluted by per-deployment custom pricing - PR #20679
    • Avoid in-place mutation in SpendUpdateQueue aggregation - PR #20876

    MCP Gateway (12 updates)

    • MCP M2M OAuth2 Support - Add support for machine-to-machine OAuth2 for MCP servers - PR #20788
    • MCP Server Discovery UI - Browse and discover available MCP servers from the UI - PR #21079
    • MCP Tracing - Add OpenTelemetry tracing for MCP calls running through AI Gateway - PR #21018
    • MCP OAuth2 Debug Headers - Client-side debug headers for OAuth2 troubleshooting - PR #21151
    • Fix MCP "Session not found" errors - Resolve session persistence issues - PR #21040
    • Fix MCP OAuth2 root endpoints returning "MCP server not found" - PR #20784
    • Fix MCP OAuth2 query param merging when authorization_url already contains params - PR #20968
    • Fix MCP SCOPES on Atlassian issue - PR #21150
    • Fix MCP StreamableHTTP backend - Use anyio.fail_after instead of asyncio.wait_for - PR #20891
    • Inject NPM_CONFIG_CACHE into STDIO MCP subprocess env - PR #21069
    • Block spaces and hyphens in MCP server names and aliases - PR #21074

    Performance / Loadbalancing / Reliability improvements (8 improvements)

    • Remove orphan entries from queue - Fix memory leak in scheduler queue - PR #20866
    • Remove repeated provider parsing in budget limiter hot path - PR #21043
    • Use current retry exception for retry backoff instead of stale exception - PR #20725
    • Add Semgrep & fix OOMs - Static analysis rules and out-of-memory fixes - PR #20912
    • Add Pyroscope for continuous profiling and observability - PR #21167
    • Respect ssl_verify with shared aiohttp sessions - PR #20349
    • Fix shared health check serialization - PR #21119
    • Change model mismatch logs from WARNING to DEBUG - PR #20994

    Database Changes

    Schema Updates

    Table | Change Type | Description | PR | Migration
    LiteLLM_VerificationToken | New Indexes | Added indexes on user_id+team_id, team_id, and budget_reset_at+expires | PR #20736 | Migration
    LiteLLM_PolicyAttachmentTable | New Column | Added tags text array for policy-to-tag connections | PR #21061 | Migration
    LiteLLM_AccessGroupTable | New Table | Access groups for managing model, MCP server, and agent access | PR #21022 | Migration
    LiteLLM_AccessGroupTable | Column Change | Renamed access_model_ids to access_model_names | PR #21166 | Migration
    LiteLLM_ManagedVectorStoreTable | New Table | Managed vector store tracking with model mappings | - | Migration
    LiteLLM_TeamTable, LiteLLM_VerificationToken | New Column | Added access_group_ids text array | PR #21022 | Migration
    LiteLLM_GuardrailsTable | New Column | Added team_id text column | - | Migration

    Documentation Updates (14 updates)

    • LiteLLM Observatory section added to v1.81.9 release notes - PR #20675
    • Callback registration optimization added to release notes - PR #20681
    • Middleware performance blog post - PR #20677
    • UI Team Soft Budget documentation - PR #20669
    • UI Contributing and Troubleshooting guide - PR #20674
    • Reorganize Admin UI subsection - PR #20676
    • SDK proxy authentication (OAuth2/JWT auto-refresh) - PR #20680
    • Forward client headers to LLM API documentation fix - PR #20768
    • Add docs guide for using policies - PR #20914
    • Add native thinking param examples for Claude Opus 4.6 - PR #20799
    • Fix Claude Code MCP tutorial - PR #21145
    • Add API base URLs for Dashscope (International and China/Beijing) - PR #21083
    • Fix DEFAULT_NUM_WORKERS_LITELLM_PROXY default (1, not 4) - PR #21127
    • Correct ElevenLabs support status in README - PR #20643

    New Contributors

    • @iver56 made their first contribution in PR #20643
    • @eliasaronson made their first contribution in PR #20666
    • @NirantK made their first contribution in PR #19656
    • @looksgood made their first contribution in PR #20919
    • @kelvin-tran made their first contribution in PR #20548
    • @bluet made their first contribution in PR #20873
    • @itayov made their first contribution in PR #20729
    • @CSteigstra made their first contribution in PR #20960
    • @rahulrd25 made their first contribution in PR #20569
    • @muraliavarma made their first contribution in PR #20598
    • @joaokopernico made their first contribution in PR #21039
    • @datzscaler made their first contribution in PR #21077
    • @atapia27 made their first contribution in PR #20922
    • @fpagny made their first contribution in PR #21121
    • @aidankovacic-8451 made their first contribution in PR #21119
    • @luisgallego-aily made their first contribution in PR #19935

    Full Changelog

    v1.81.9.rc.1...v1.81.12.rc.1

    Original source
Releasebot

Curated by the Releasebot team

Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.

Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.

Similar to liteLLM with recent updates: