AI/ML Infrastructure Release Notes

Release notes for AI compute platforms, inference clouds and ML tooling

Get this feed:

Products (16)

Latest AI/ML Infrastructure Updates

  • Jun 17, 2026
    • Date parsed from source:
      Jun 17, 2026
    • First seen by Releasebot:
      Jun 18, 2026
    Together AI logo

    Together AI

    June 17, 2026

    Together AI adds new serverless models, including zai-org/GLM-5.2 with long context, FP4 quantization, and function calling.

    New serverless models

    The following models are now available on serverless:

    • zai-org/GLM-5.2: 262K context length, FP4 quantization. Pricing: $1.40 input / $4.40 output / $0.26 cached input (per 1M tokens). Supports function calling and structured outputs.
    Original source
  • Jun 17, 2026
    • Date parsed from source:
      Jun 17, 2026
    • First seen by Releasebot:
      Jun 18, 2026
    mem0 logo

    mem0

    Mem0 Python SDK (v2.0.7)

    mem0 releases Python SDK v2.0.7 with Gemini via Vertex AI support, native batched embeddings for Ollama, and broad fixes across core, LLM, embeddings, reranking, and vector store workflows for smoother async handling, resets, search, and get behavior.

    Mem0 Python SDK (v2.0.7)

    LLMs: Add Gemini via Vertex AI as LLM provider (#4030)

    Embeddings: Add native embed_batch to OllamaEmbedding for batched embedding requests (#5415)

    Bug Fixes

    Core: Fix api_error_handler silently dropping return values from async methods (#5540)

    Core: Fix AsyncMemory.reset() not resetting the entity store (#5535)

    Core: Fix async delete_all aborting on first error, leaving partial deletion (#5529)

    Core: Skip messages without a content key in message parsers to prevent KeyError crashes (#5575)

    Core: Preserve custom metadata fields during memory update (#5480)

    LLMs: Fix Anthropic tool_choice format and tool response parsing (#5537)

    LLMs: Fix Ollama json format mutating the caller's messages list in-place (#5539)

    LLMs: Omit None config values from Gemini GenerateContentConfig to prevent validation errors (#5528)

    LLMs: Honor reasoning-model params in AzureOpenAIStructuredLLM (#5548)

    LLMs: Honor reasoning-model params in OpenAIStructuredLLM (#5458)

    LLMs: Send max_completion_tokens for the GPT-5 family across all providers (#5547)

    LLMs: Accept and forward **kwargs in Together, LangChain, and Sarvam providers (#5556)

    LLMs: Fix Bedrock AI21 response parse default using dict literal instead of set (#5527)

    LLMs: Fix LiteLLM function-calling check blocking all calls on non-tool models (#5536)

    LLMs: Fix HuggingFace provider using self.config instead of raw config parameter (#5538)

    Embeddings: Honor aws_session_token in AWS Bedrock embeddings (#5566)

    Rerankers: Respect config.top_k in Cohere and ZeroEntropy fallback paths (#5560)

    Vector Stores: Fix FAISS filtered search dropping over-fetched candidates before filtering (#5453)

    Vector Stores: Fix Weaviate reset() crashing with missing vector_size argument (#5531)

    Vector Stores: Pass embedding dims in Weaviate reset() to avoid re-init crash (#5570)

    Vector Stores: Fix MongoDB reset() passing wrong argument to create_col() (#5532)

    Vector Stores: Fix Pinecone hybrid search crashing when filters is None (#5533)

    Vector Stores: Fix Redis crashing on empty or None filters in search() and list() (#5446)

    Vector Stores: Return None from get() for missing IDs in Milvus, Weaviate, and Supabase (#5562)

    Vector Stores: Return None from ChromaDB get() for missing IDs (#5561)

    Original source
  • All of your release notes in one feed

    Join Releasebot and get updates from Together AI and hundreds of other software products.

    Create account
  • Jun 17, 2026
    • Date parsed from source:
      Jun 17, 2026
    • First seen by Releasebot:
      Jun 18, 2026
    mem0 logo

    mem0

    Mem0 Node SDK (v3.0.9)

    mem0 releases Node SDK v3.0.9 with Anthropic tool handling fixes, better tool response parsing, updated default LLM settings, and new Anthropic config options. It also preserves custom metadata in memory updates and exports, while bumping esbuild to address a security issue.

    Mem0 Node SDK (v3.0.9)

    Bug Fixes

    LLMs: Fix Anthropic tool_choice format — was incorrectly sent as a bare string "auto" (rejected by the API); now correctly sent as { type: "auto" }. Also fixes tool response parsing: tool_use blocks are now parsed into toolCalls objects instead of throwing. Updated default model to claude-sonnet-4-6 and default max_tokens to 2000 to match the Python provider. Added temperature, topP, and maxTokens to LLMConfig so Anthropic params can be configured (#5537)

    Memory (OSS): Preserve custom metadata fields during update() — fields such as category, priority, and other user-defined keys were previously dropped on update; the existing payload is now spread before applying the new data (#5480)

    Client: Preserve user-defined schema keys in createMemoryExport (#5594)

    Security

    Dependencies: Bump esbuild to >=0.28.1 across all npm packages via pnpm overrides to remediate upstream vulnerability (#5563)

    Original source
  • Jun 17, 2026
    • Date parsed from source:
      Jun 17, 2026
    • First seen by Releasebot:
      Jun 18, 2026
    mem0 logo

    mem0

    Mem0 OpenCode Plugin (v0.2.0)

    mem0 ships the OpenCode Plugin v0.2.0 with native memory tools, direct mem0ai SDK support, new memory scope controls, auto-dream consolidation, expanded telemetry, and improved skill loading and project detection. It also tightens status reporting and safety around deletes.

    Mem0 OpenCode Plugin (v0.2.0)

    Changed (breaking)

    Memory tools are now native OpenCode tools registered via the @opencode-ai/plugin tool() helper and backed by the mem0ai SDK directly. The plugin no longer registers or depends on the remote MCP server (mcp.mem0.ai); the bundled opencode.json and the regex-based MCP call interception have been removed. Tools: add_memory, search_memories, get_memories, get_memory, update_memory, delete_memory, delete_all_memories, delete_entities, list_entities, plus a get_event_status helper for async-write status.

    Skills load via the config hook (skills.paths) instead of being copied into the project's .opencode/ directory on startup. The installSkills() filesystem copy and the cli.ts installer (mem0-opencode bin) have been removed — install with opencode plugin @mem0/opencode-plugin.

    Trimmed to 9 focused skills (context-loader, dream, forget, status, search, scope, pin, remember, tour). Removed import, export, memory-reviewer, mem0 (SDK reference), list-projects, stats, and onboard. The old stateful switch-project skill is superseded by the project/session/global scope model and the new /mem0-scope skill.

    Added

    Expanded telemetry to the full shared plugin.* schema. In addition to plugin.session_start and plugin.tool_use, the plugin now emits plugin.user_prompt, plugin.bash_error, plugin.pre_compact, and plugin.session_stop. tool_use now fires from inside each native tool. Every event also carries project_hash (anonymized sha256(app_id)) and os_version, matching the editor plugin's telemetry.py.

    Auto-dream — gated automatic memory consolidation (ported from the pi-agent plugin). When the time (minHours, default 24), session-count (minSessions, default 5), and memory-count (minMemories, default 20) gates all pass, the plugin injects a consolidation protocol so the agent merges duplicates, drops stale/sensitive entries, and rewrites vague ones before answering. A filesystem lock (~/.mem0/mem0-dream.lock) prevents concurrent sessions from dreaming at once, and completion resets the gates. Tune via the dream block in ~/.mem0/settings.json; disable with MEM0_DREAM=false. Emits plugin.dream_triggered / plugin.dream_completed.

    Memory scope — per-call parameter and a persistent default. search_memories, get_memories, add_memory, and delete_all_memories accept an optional scope: "project" (this repo, default), "session" (this run, adds run_id), or "global" (across all the user's projects — app_id: "*" for reads, user-wide for writes). The new /mem0-scope skill views and changes the default scope (used when no scope is passed), persisted to ~/.mem0/settings.json (default_scope) and read fresh on each memory operation so changes apply immediately — no restart. add_memory / search_memories / get_memories honor the default (an explicit scope, filters, or agent_id still wins; a project default preserves prior behavior, including global_search).

    Changed

    /mem0-status now reports the active default scope and auto-dream readiness. It reads default_scope from ~/.mem0/settings.json (falling back to project) and shows the auto-dream gate progress (sessions / memories / time vs. thresholds) so it's clear why a consolidation hasn't run yet.

    Fixed

    Skills load in place via skills.paths — no copying. The config hook adds the plugin's own opencode-skills/ directory to OpenCode's skills.paths, so OpenCode discovers the skills directly from the linked/installed plugin package (recursive **/SKILL.md scan). The installSkills() step that copied skills into ~/.config/opencode/skills/ (and the legacy ~/.opencode/skills/) and its version-marker gating are removed — the plugin no longer writes into those directories or creates ~/.opencode. The config hook still registers the /mem0-* slash commands via config.command: OpenCode's TUI slash menu is built from config.command, and skills on skills.paths are available to the agent's skill tool but do not appear as slash commands on their own. Skill dir names are mem0-<skill> (matching ^[a-z0-9]+(-[a-z0-9]+)*$); commands are /mem0-<skill>.

    Robust project-id (app_id) detection. Parsed from the git remote's owner/repo — handling https, scp-style ssh, and custom ssh host aliases like [email protected]:owner/repo.git — falling back to the git repo's root directory name (not the cwd, which may be a sub-directory or your home dir), then the cwd. Fixes the project showing as your username/home when OpenCode was launched outside the repo root.

    Auto-dream visibility + robustness. When auto-dream doesn't fire, the plugin logs the blocking gate (e.g. auto-dream waiting — memories: 3 < 20), and /mem0-status surfaces the same gate progress. The session-start memory count is parsed defensively (handles both paginated {count} and bare-array SDK responses) so the memory gate evaluates correctly.

    Error-pattern lookup in tool.execute.after no longer issues two identical mem0.search() calls; it now performs a single topK: 6 search.

    Corrected the documented system-prompt hook name from experimental.chat.system.transform to the actual experimental.chat.messages.transform.

    Safety

    delete_all_memories deliberately ignores the default scope. Deleting user-wide always requires an explicit scope="global", so raising the default to global can never turn a routine cleanup into a cross-project wipe.

    Original source
  • Jun 17, 2026
    • Date parsed from source:
      Jun 17, 2026
    • First seen by Releasebot:
      Jun 18, 2026
    CoreWeave logo

    CoreWeave

    June 17, 2026

    CoreWeave reduces default AI Object Storage and Distributed File Storage quotas.

    Default storage quotas for CoreWeave AI Object Storage and Distributed File Storage have been reduced.

    The default AI Object Storage quota is now 20 TiB per availability zone, and the default DFS quota is now 10 TiB per CKS cluster.

    See AI Object Storage quotas and DFS quotas for current values.

    Original source
  • Jun 16, 2026
    • Date parsed from source:
      Jun 16, 2026
    • First seen by Releasebot:
      Jun 17, 2026
    Together AI logo

    Together AI

    June 16, 2026

    Together AI improves the Python SDK with duplicate file upload errors and reuse-friendly file IDs.

    Python SDK: duplicate file uploads now raise an error

    client.files.upload() in the Python SDK now raises a ValueError when the file’s contents already exist on Together AI. The error message includes the ID of the existing file so you can reuse it without re-uploading.

    To replace the file, delete the existing one first with client.files.delete(<file-id>) and retry the upload.

    Original source
  • Jun 16, 2026
    • Date parsed from source:
      Jun 16, 2026
    • First seen by Releasebot:
      Jun 17, 2026
    OpenRouter logo

    OpenRouter

    Subagent: Let Your Model Delegate the Busywork

    OpenRouter adds the openrouter:subagent tool, letting models delegate routine tasks like summarization, data extraction, boilerplate writing, and format conversion to a cheaper worker model while the frontier model keeps orchestrating. It also supports worker tools, billing separation, and recursion limits.

    Find subagent opportunities in your codebase

    Paste this prompt into your coding agent to have it scan your project for places where subagent delegation would cut costs:

    Read through this codebase and identify places where an OpenRouter API call
    could benefit from the openrouter:subagent server tool. Look for patterns where
    a frontier model is doing mechanical sub-tasks inline: summarization, data
    extraction, reformatting, boilerplate generation, or schema conversion.
    For each candidate, explain:

    1. Which file and function
    2. What the sub-task is
    3. Why it's a good fit for delegation (self-contained, predictable output, doesn't need the full conversation context)
    4. A code snippet showing how to add the subagent tool to that call
      Reference docs: https://openrouter.ai/docs/guides/features/server-tools/subagent
      Cookbook recipe: https://openrouter.ai/docs/cookbook/building-agents/subagent-server-tool

    Frontier brain, budget hands

    Claude Opus 4.8 costs $5 per million input tokens. GPT-5.5 costs $5. GLM 5.2 costs $1.40. That’s a 3.6x spread on input between frontier and worker, 5.7x on output. (Claude Fable 5 was $10/$50 per M tokens before it got yanked, RIP.)

    A frontier model doing a code review doesn’t need to spend its own tokens summarizing a 2,000-line changelog or reformatting a JSON blob. Those are mechanical tasks with clear instructions and predictable output. The subagent handles them at GLM prices while the orchestrator focuses on the parts that actually require reasoning.

    In a complex agentic workflow with 20 tool calls, maybe 5-8 are subagent delegations: summarization, data extraction, template filling, format conversion. The frontier model orchestrates and judges. You’ve cut your per-request cost without touching the quality ceiling on the hard parts.

    How it works under the hood

    The worker model sees only what the delegating model explicitly passes in the task_description. No parent conversation, no prior context, no memory between tasks. Each delegation is a clean, isolated unit of work.

    1. Any model can be the worker. Pin it with parameters.model (anything in the model catalog works). Open-source models like z-ai/glm-5.2 work well for mechanical tasks. If you don’t specify a model, it falls back to the outer request model.
    2. Workers get their own tools. Give the worker openrouter:web_search and it can ground its output in fresh sources before responding. The worker runs its own tool loop internally; only the final text comes back to your model.
    3. Recursion is blocked. The subagent can’t call itself. A depth header and self-reference check prevent unbounded nesting, and delegations are capped at 10 per request.

    Subagent vs. advisor

    These two tools point in opposite directions. The advisor escalates hard decisions to a stronger model. The subagent delegates routine work to a cheaper one.

    Use both in the same request. Your frontier model consults the advisor on architectural decisions and delegates summarization to the subagent. Different tools for different kinds of work.

    Billing

    Subagent tokens bill at the worker model’s rates, separate from the orchestrator. If your orchestrator is Claude Opus 4.8 ($5/$25 per M tokens) and the worker is GLM 5.2 ($1.40/$4.40 per M tokens), each model’s tokens bill at their own price. Both show up on your activity page.

    Get started

    One line in your tools array:

    {
      "type": "openrouter:subagent",
      "parameters": { "model": "z-ai/glm-5.2" }
    }
    

    The model decides when to use it. Read the full docs for all parameters, worker tools, and recursion details, or follow the cookbook recipe for a working integration.

    Original source
  • Jun 15, 2026
    • Date parsed from source:
      Jun 15, 2026
    • First seen by Releasebot:
      Jun 17, 2026
    OpenRouter logo

    OpenRouter

    Keep Your Agent Running When Models Disappear

    OpenRouter introduces presets for server-side model routing, making it easier to set fallback chains, provider rules, parameters, and system prompts in one place. The update helps teams survive model deprecations and provider restrictions without editing code or redeploying.

    Providers retire and restrict models routinely

    More than 70 models have been pulled or deprecated by providers in the last few years. Anthropic’s Fable being pulled recently is perhaps the most high-profile and impactful example of this we’ve seen, but the pattern isn’t new and isn’t going away.

    OpenRouter already handles one layer of this for you. When a model runs on several providers and one of them fails or rate-limits, the marketplace reroutes to another provider automatically, with no configuration. We cover how that failover works in a separate post.

    That keeps a single model reachable through provider trouble, but it can’t help once the model itself is gone. For that you want model failover, where requests move to a different model when your first choice disappears. Presets are how you set that up.

    Hard-coding a model slug pins your choice inside every service that uses it. When that model goes away, the only fix is to edit the code and redeploy each service, and requests keep failing until you do.

    A preset takes that choice out of your code. It’s a named, server-side configuration (model, fallback models, provider rules, parameters, and a system prompt) that you reference by slug. The model lives in the preset instead of the code, so you change it in one place and every service that calls the preset picks it up with no redeploy.

    Here’s a simple preset definition. Copy it and adjust the models:

    {
      "models": [
        "anthropic/claude-fable-5",
        "anthropic/claude-opus-4.8",
        "openai/gpt-5.5"
      ],
      "provider": {
        "allow_fallbacks": true
      }
    }
    

    The models array is your fallback chain, in priority order. If the first model is unavailable, OpenRouter tries the next one.

    Hard-coded slug vs preset reference

    • When a provider restricts the model, with a hard-coded slug every service breaks until you edit and redeploy; with a preset you edit the preset once, callers keep running.
    • Who ships the fix: hard-coded - whoever owns each codebase; preset - whoever owns the preset.
    • Blast radius of a change: hard-coded - one edit per repo, per service; preset - one edit, applied everywhere.
    • Data policy (ZDR, retention): hard-coded - re-stated in every request; preset - set once on the preset.
    • Rollback: hard-coded - revert a commit and redeploy; preset - re-designate a previous version.

    Give this to your agent

    "I want to stop hard-coding model slugs so one provider change can't take down my app. Set up an OpenRouter preset and route my calls through it.

    1. Create a preset named "customer-support" with a fallback chain: a primary model plus 2 backups in priority order, using the models array.
    2. Set provider rules on the preset: allow_fallbacks true, and zdr true if my data policy requires Zero Data Retention.
    3. Capture it by POSTing a known-good chat/completions body to https://openrouter.ai/api/v1/presets/customer-support/chat/completions with my OpenRouter API key.
    4. Replace the model field in my inference calls with "@preset/customer-support".
    5. Keep my OpenRouter API key in an environment variable. Never hard-code it.

    Use these references for current shapes:

    • Presets: https://openrouter.ai/docs/guides/features/presets
    • Provider routing and fallbacks: https://openrouter.ai/docs/guides/routing/provider-selection"

    Capture a working request as a preset

    You can build a preset in the dashboard, or capture one from a request body you already trust.
    Send a known-good chat/completions body to the preset capture endpoint. OpenRouter persists the fields that overlap with the preset config (models, provider, temperature, and so on) and ignores transient fields like messages:

    curl https://openrouter.ai/api/v1/presets/customer-support/chat/completions \
    -H "Authorization: Bearer $OPENROUTER_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "models": [ "anthropic/claude-fable-5", "anthropic/claude-opus-4.8", "openai/gpt-5.5" ],
    "provider": { "allow_fallbacks": true },
    "messages": [
    { "role": "system", "content": "You are a concise support assistant." },
    { "role": "user", "content": "Summarize this ticket in one sentence." }
    ]
    }'
    

    If a preset with that slug already exists, this creates a new version and designates it active. If it doesn’t exist, it creates the preset. Pick a slug that isn’t already in use, since capturing onto an existing slug overwrites its live config with a new active version.

    Example response excerpt:

    {
      "data": {
        "name": "customer-support",
        "slug": "customer-support",
        "status": "active",
        "designated_version": {
          "version": 1,
          "system_prompt": "You are a concise support assistant.",
          "config": {
            "models": [
              "anthropic/claude-fable-5",
              "anthropic/claude-opus-4.8",
              "openai/gpt-5.5"
            ],
            "provider": {
              "allow_fallbacks": true
            }
          }
        }
      }
    }
    

    Reference the preset from your code

    Now point your inference calls at @preset/customer-support. The model choice lives in the preset, so this line stays the same when the underlying model changes.

    Install an SDK first:

    Python:

    pip install openrouter
    

    TypeScript:

    npm install @openrouter/sdk
    

    Usage example in Python:

    from openrouter import OpenRouter
    import os
    client = OpenRouter(api_key=os.getenv("OPENROUTER_API_KEY"))
    response = client.chat.send(
        model="@preset/customer-support",
        messages=[{"role": "user", "content": "Summarize this ticket in one sentence."}]
    )
    print(response.choices[0].message.content)
    

    Usage example in TypeScript/JavaScript:

    import { OpenRouter } from '@openrouter/sdk';
    const client = new OpenRouter({ apiKey: process.env.OPENROUTER_API_KEY });
    const response = await client.chat.send({
      chatRequest: {
        model: '@preset/customer-support',
        messages: [{ role: 'user', content: 'Summarize this ticket in one sentence.' }],
      },
    });
    console.log(response.choices[0]?.message.content);
    

    Default to the @preset/slug form shown above. Reach for the combined model@preset/slug form when you want to name a base model and layer a preset’s config on top of it, or use a separate preset field alongside model if you’d rather keep them as distinct request fields. The presets docs cover all 3.

    Add fallback models so requests keep flowing

    The models array is the part that survives a deprecation. Pass models in priority order, and OpenRouter walks the list when one is unavailable.
    Put the model you trust most as the last entry, so your final fallback is a floor you’re comfortable shipping. For a coding workload that leaned on Fable 5, a chain like anthropic/claude-fable-5, then anthropic/claude-opus-4.8, then openai/gpt-5.5 keeps strong models in reserve.

    Example fallback firing:

    With Fable 5 restricted, a request naming the chain above succeeds anyway, and the response’s model and provider fields name what actually served it:

    {
      "model": "anthropic/claude-4.8-opus-20260528",
      "provider": "Anthropic",
      "choices": [{ "message": { "role": "assistant", "content": "Customer cannot log in because password reset emails are not being received, despite checking spam and confirming the correct email address." } }]
    }
    

    OpenRouter skipped the restricted primary and served the next model in the array. Your code didn’t change.
    The model field reports the concrete version that actually served the request, so it reads differently from the slug you sent. Here anthropic/claude-opus-4.8 resolved to the dated build anthropic/claude-4.8-opus-20260528 on Anthropic.

    Two layers of recovery stack here. Provider-layer failover is automatic: for one model served by several providers, OpenRouter retries the next provider on a 5xx or rate-limit. Model-layer fallbacks are the models array, which moves to a different model when the whole primary is gone. For the mechanics of each, see reliability and automatic failover and model routing.

    Set your data policy in the preset

    Provider rules ride along in the same preset, so a routing policy applies to every caller without a code change.
    Example preset with Zero Data Retention and data collection rules:

    {
      "models": [
        "anthropic/claude-fable-5",
        "anthropic/claude-opus-4.8",
        "openai/gpt-5.5"
      ],
      "provider": {
        "zdr": true,
        "data_collection": "deny",
        "allow_fallbacks": true
      }
    }
    

    zdr: true keeps requests on endpoints that honor Zero Data Retention.
    data_collection: "deny" blocks providers that train on or store prompts. You can also pin or exclude specific providers with only, ignore, and order. See provider routing for the full list.

    This is where the Fable 5 situation gets concrete. Its model page notes that Anthropic’s policy “does not allow zero data retention.” With zdr: true set on the preset, routing skips Fable 5 because it can’t satisfy the rule, and falls through to the next model in your array that can. One switch, enforced server-side, for every request that names the preset.

    Roll out and roll back across your team

    On an organization account, every member can use organization presets, so a routing decision made once is shared instead of copied into each repo.
    Every capture or edit creates a new version and marks it active. Version history is kept, so a bad change is one re-designation away from a rollback. Through the API, the latest designated version is always the one that runs. You re-designate versions and delete presets from the dashboard; the API captures and reads presets but has no delete endpoint.
    Parameters you pass in a request override the preset’s values, shallow-merged. Request fields win, and preset fields you don’t send are preserved. That lets a single call bump temperature without forking the preset.

    Wire it all together

    The data-policy config above already holds all 3 layers: the model chain, the provider policy, and (once you add one) the system prompt. Capture it once with the curl call, reference @preset/customer-support everywhere, and the next time a provider restricts a model you edit one config instead of chasing slugs through every service.
    If you’d rather not pin a primary at all, point the chain at a self-updating alias like ~anthropic/claude-opus-latest, which always resolves to the newest model in that family.

    Start with one preset

    Pick your highest-traffic call, create a preset for it at openrouter.ai/settings/presets, give it a fallback chain, and swap the model string for @preset/your-slug. That one move turns a forced migration into a config edit.
    Presets also pair well with governance work: the same control point that survives a deprecation is where you enforce data-handling rules, as covered in human oversight for AI agents.

    Note: This post covers engineering patterns, not legal advice. For export-control, data-residency, or retention obligations, consult counsel about your specific use case and jurisdiction.

    FAQ

    What happens to my app when a model is deprecated or restricted?
    If your code hard-codes the model slug, requests to that model start failing and every service that used it breaks until you edit the code and redeploy. Route through a preset and you edit one config; callers pick up the change with no redeploy. A models fallback array can keep requests succeeding on a backup while you decide what to do.

    How is a preset different from passing a models fallback array in code?
    A models array sets the fallback order for one request. A preset stores that array, plus provider rules, parameters, and a system prompt, on the server under a slug. You reference it with @preset/slug, so the configuration lives in one place across every service and changes without a code edit.

    Do request parameters override preset values?
    Yes. Request parameters take priority over the preset’s values, shallow-merged. Request-level fields override matching preset fields, and preset fields you don’t send are preserved.

    Can I enforce Zero Data Retention with a preset?
    Yes. Set provider.zdr to true. OpenRouter routes only to endpoints that honor Zero Data Retention, skips models or providers that can’t, and falls through your models array to the next qualifying option.

    How do I roll back a preset change?
    Every capture or edit creates a new version and designates it active. Version history is kept, so you can re-designate a previous version. Through the API, the latest designated version is always used.

    Original source
  • Jun 16, 2026
    • Date parsed from source:
      Jun 16, 2026
    • First seen by Releasebot:
      Jun 17, 2026
    Baseten logo

    Baseten

    GLM 5.2 available on Baseten

    Baseten adds GLM 5.2 to its Model APIs, giving users OpenAI-compatible access to Z.ai’s flagship agentic engineering model for long-horizon coding tasks, with dedicated deployments available for larger workloads.

    You can start sending requests to GLM 5.2 today through our Model APIs by calling the OpenAI-compatible endpoint with your Baseten API key. For larger workloads, dedicated deployments are available.

    GLM-5.2 is Z.ai's flagship model for agentic engineering is built to perform well on long-horizon coding tasks. GLM-5.2 runs on the Baseten Inference Stack.

    curl -X POST https://inference.baseten.co/v1/chat/completions \
    -H "Authorization: Api-Key $BASETEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "model": "zai-org/GLM-5.2",
    "messages": [{"role": "user", "content": "Refactor this function for readability"}]
    }'
    

    For more information and to get started, see our docs.

    Original source
  • Jun 16, 2026
    • Date parsed from source:
      Jun 16, 2026
    • First seen by Releasebot:
      Jun 17, 2026
    Baseten logo

    Baseten

    Kimi K2.7 Code available on Baseten

    Baseten adds Kimi K2.7 Code to its Model APIs, making Moonshot AI’s coding-focused model available through an OpenAI-compatible endpoint. It also offers dedicated deployments for larger workloads and highlights support for long-horizon engineering tasks with a 262K-token context window.

    You can start sending requests to Kimi-K2.7-Code today through our Model APIs by calling the OpenAI-compatible endpoint with your Baseten API key. For larger workloads, dedicated deployments are available.

    Kimi K2.7 Code is Moonshot AI's coding-focused model, built for long-horizon agentic engineering tasks with a 262K-token context window. Kimi-K2.7-Code runs on the Baseten Inference Stack.

    curl -X POST https://inference.baseten.co/v1/chat/completions \
    -H "Authorization: Api-Key $BASETEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "moonshotai/Kimi-K2.7-Code",
      "messages": [{
        "role": "user",
        "content": "Write a binary search in Rust"
      }]
    }'
    

    For more information and to get started, see our docs.

    Original source