Baseten Updates & Release Notes

Name: Baseten
Brand: Baseten

Follow Baseten to add their release notes to your feed!

52 updates curated from 54 sources by the Releasebot Team. Last updated: Jul 31, 2026

Get this feed:

AI/ML Infrastructure

Jul 30, 2026
Date parsed from source:
Jul 30, 2026

First seen by Releasebot:
Jul 31, 2026
Baseten

Inkling Small available on Baseten

Baseten adds Inkling Small to Baseten Model APIs, bringing OpenAI-compatible access to Thinking Machines Lab’s multimodal model with a 1M-token context window, tool calling, structured outputs, and controllable reasoning for lower-latency, lower-cost workloads.
Inkling Small is now available through Baseten Model APIs. Send requests to thinkingmachines/inkling-small through our OpenAI-compatible endpoint with your Baseten API key. Dedicated deployments are also available for larger workloads.

Inkling Small is Thinking Machines Lab’s open-weights, 276B-parameter mixture-of-experts model with 12B active parameters. It retains Inkling’s 1M-token context window, native text, image, and audio inputs, tool calling, structured outputs, and controllable reasoning in a smaller model designed for workloads where latency and inference cost matter. Thinking Machines reports comparable performance to Inkling at roughly one-quarter its size.

curl https://inference.baseten.co/v1/chat/completions \ -H "Authorization: Bearer $BASETEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "thinkingmachines/inkling-small", "messages": [ { "role": "user", "content": "Compare sparse and dense transformer architectures." } ], "reasoning_effort": "medium" }'

For supported reasoning settings and multimodal request examples, see the docs.
Original source
Jul 29, 2026
Date parsed from source:
Jul 29, 2026

First seen by Releasebot:
Jul 30, 2026
Baseten

Introducing Baseten for Model Labs

Baseten for Model Labs adds infrastructure to help labs bring models to market, including a Frontier Gateway for branded production APIs and a Distribution Platform for publishing models as shared or dedicated APIs while Baseten handles serving, billing, and distribution.
Baseten for Model Labs gives labs the infrastructure to bring models to market without building their own serving and distribution systems.

Frontier Gateway: Run a production-ready API under your own brand. Manage customer credentials, model access, rate limits, usage limits, and billing events across models hosted on Baseten, external providers, or OpenAI-compatible endpoints.

Distribution Platform: Publish your models to Baseten customers as shared Model APIs, dedicated deployments, or both. Baseten manages customer billing while protecting your model artifacts.

Both run on Baseten’s, so your team can focus on model research while Baseten handles serving and distribution.

To get started, reach out to us or for more information, see our docs.
Original source
All of your release notes in one feed

Join Releasebot and get updates from Baseten and hundreds of other software products.

Create account
Get updates with:
Jul 27, 2026
Date parsed from source:
Jul 27, 2026

First seen by Releasebot:
Jul 28, 2026
Baseten

Kimi K3 available on Baseten

Baseten adds Kimi K3 to its Model APIs, with OpenAI-compatible access, dedicated deployments for larger workloads, a 1M-token context window, tool calling, and day-one availability on the Baseten Inference Stack.
You can start sending requests to Kimi K3 today through our Model APIs by calling the OpenAI-compatible endpoint with your Baseten API key. For larger workloads, dedicated deployments are available.

Kimi K3 is Moonshot AI's latest open-weights model, available on Baseten from day 0 with a 1M-token context window and tool calling. Kimi K3 runs on the Baseten Inference Stack.

curl -X POST https://inference.baseten.co/v1/chat/completions \ -H "Authorization: Bearer $BASETEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "moonshotai/Kimi-K3", "messages": [{"role": "user", "content": "Summarize the main argument in this paper"}] }'

For more information and to get started, see our docs.
Original source
Jul 23, 2026
Date parsed from source:
Jul 23, 2026

First seen by Releasebot:
Jul 24, 2026
Baseten

GLM 5.2 Fast available on Baseten

Baseten introduces GLM 5.2 Fast, the first model in its new Fast tier on Model APIs, with dedicated capacity for higher sustained throughput, separate pricing and rate limits, and OpenAI-compatible access for agentic coding and real-time chat.
GLM 5.2 Fast is the first model in our new Fast tier on Model APIs: the same GLM 5.2 model weights served on dedicated capacity engineered for higher sustained per-user throughput, built for agentic coding and real-time conversational applications.

It ships as its own model slug with its own pricing and rate limits, behind the same OpenAI-compatible API, so switching is a one-line change. If Fast capacity is temporarily saturated, requests keep serving on standard GLM 5.2 capacity: slower, not failed.

curl https://inference.baseten.co/v1/chat/completions \ -H "Authorization: Bearer $BASETEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "zai-org/GLM-5.2-Fast", "messages": [{"role": "user", "content": "What makes an inference stack fast?"}] }'

For more information, see our docs.
Original source
Jul 23, 2026
Date parsed from source:
Jul 23, 2026

First seen by Releasebot:
Jul 24, 2026
Baseten

API key management keys

Baseten adds a new org-scoped API key type that lets admins create, list, and revoke team and personal keys through the Management API, making credential rotation and key provisioning easier without dashboard access.
Automate key administration with a new org-scoped key type. A WORKSPACE_MANAGE_API_KEYS key creates team API keys of any permission level, as well as lists and revokes team and personal keys through the Management API, so your provisioning tooling can rotate credentials without a human in the dashboard. These keys are not tied to individual users.

Organization Admins can create a key from workspace settings or the API:

curl -X POST "https://api.baseten.co/v1/api_keys" \ -H "Authorization: Bearer $BASETEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{"type": "WORKSPACE_MANAGE_API_KEYS", "name": "key-provisioner"}'

For more information, see our docs.
Original source
Similar to Baseten with recent updates:
Jul 22, 2026
Date parsed from source:
Jul 22, 2026

First seen by Releasebot:
Jul 24, 2026
Baseten

Observability APIs updates

Baseten adds Management API access to logs, metrics, and audit logs across deployments, environments, models, and chains.
Pull logs, metrics, and audit logs programmatically through the Management API. Endpoints return logs and metrics for any deployment or environment across your workspace, models, or chains.

curl --request GET \ --url https://api.baseten.co/v1/models/{model_id}/audit_logs \ --header "Authorization: Bearer $BASETEN_API_KEY"

For more information, see our docs.
Original source
Jul 21, 2026
Date parsed from source:
Jul 21, 2026

First seen by Releasebot:
Jul 22, 2026
Baseten

Workspace GPU usage

Baseten adds a GPU usage tab in Organization settings for admins to track workspace usage by model, GPU type, and custom ranges.

Organization admins can now see how many GPUs their workspace is using across every model and deployment, from the new GPU usage tab in Organization settings.

Group usage by GPU type or by model, filter to specific GPU types, and view by custom ranges.

For more information, see our docs.
Original source
Jul 15, 2026
Date parsed from source:
Jul 15, 2026

First seen by Releasebot:
Jul 16, 2026
Baseten

Inkling available on Baseten

Baseten adds Inkling to its Model APIs, bringing OpenAI-compatible access to Thinking Machines Lab’s open-weights multimodal model with text, image, and audio input support, plus dedicated deployments for larger workloads.
You can start sending requests to Inkling today through our Model APIs by calling the OpenAI-compatible endpoint with your Baseten API key. For larger workloads, dedicated deployments are available.

Inkling is Thinking Machines Lab's open-weights multimodal model, built for breadth: it accepts text, image, and audio inputs, the first model on our Model APIs with audio support.

curl -X POST https://inference.baseten.co/v1/chat/completions \ -H "Authorization: Bearer $BASETEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "thinkingmachines/inkling", "messages": [{ "role": "user", "content": [ { "type": "text", "text": "Summarize what is said in this recording." }, { "type": "audio_url", "audio_url": { "url": "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav" } } ] }] }'

For more information and to get started, see our docs.
Original source
Jul 8, 2026
Date parsed from source:
Jul 8, 2026

First seen by Releasebot:
Jul 8, 2026
Baseten

Model API Deprecation (GLM 5.1, GLM 5, Kimi K2.5, Nemotron Super 120B)

Baseten announces API deprecations for GLM 5.1, GLM 5, Kimi K2.5, and Nemotron Super 120B, with model IDs becoming inactive on July 24 at 5pm PT. Customers are directed to swap in recommended alternatives or contact Baseten for a dedicated deployment.

Model API Deprecation (GLM 5.1, GLM 5, Kimi K2.5, Nemotron Super 120B)

The GLM 5.1, GLM 5, Kimi K2.5, and Nemotron Super 120B APIs will be deprecated at 5pm PT July 24th.

At that time the model IDs will become inactive and return an error for all requests. As open source models advance rapidly, we prioritize serving the highest quality models and deprecate models when stronger alternatives are available.

We recommend the following models as alternatives, all of which offer superior intelligence for each specific use case. Just swap in the new Model ID(s) prior to the deprecation date.

model alternatives

If you’d like to continue using the previous weights, please contact us about a dedicated deployment of the model.
Original source
Jul 7, 2026
Date parsed from source:
Jul 7, 2026

First seen by Releasebot:
Jul 7, 2026
Baseten

Personal API key visibility for admins

Baseten adds org admin API key visibility with member, team, and key type filters on the API keys page.

Organization admins can now view every member's personal API keys on the API keys page, alongside team keys.

A new Owner / Team column shows who each key belongs to, and you can filter the list by member, team, and key type.

Org admin view

For more information, see our docs.
Original source
Jul 7, 2026
Date parsed from source:
Jul 7, 2026

First seen by Releasebot:
Jul 7, 2026
Baseten

Events on Metrics and Logs graphs

Baseten adds platform event overlays to Metrics and Logs to help correlate spikes with deployments, autoscaling, and updates.

Metrics overlay

Metrics and Logs now overlay platform events on your graphs, so you can line up a latency spike or scaling change with what caused it. Deployments, promotions, autoscaling and instance-type changes, activations, replica terminations, and environment updates all appear as markers.

In Metrics, turn on the Events toggle; on the Logs volume chart, markers always show.

For more information, see our docs.
Original source
Jul 1, 2026
Date parsed from source:
Jul 1, 2026

First seen by Releasebot:
Jul 7, 2026
Baseten

Try the new baseten CLI

Baseten launches a new CLI for deploying, calling, logging, monitoring, and managing models with JSON and jq support.
We're building one CLI for the whole Baseten model workflow: deploy from local, call your models, stream and filter logs, check metrics, and manage deployments, environments, and secrets, all with --output json and --jq for agents and scripting.

To get started, use brew on macOS:

Add and trust the tap

brew tap basetenlabs/baseten brew trust basetenlabs/baseten

Install the CLI

brew install baseten

Give feedback, file an issue, or PR at github.com/basetenlabs/baseten-cli.

For more information, see our docs.
Original source
Jun 30, 2026
Date parsed from source:
Jun 30, 2026

First seen by Releasebot:
Jul 7, 2026
Baseten

Connect coding agents to Baseten

Baseten adds MCP server support for coding agents, letting users manage workspaces from their agent with tools to deploy and promote models, tune autoscaling, pull logs, and launch training jobs, with read-only and mutating actions clearly labeled.
Connect your coding agent to the Baseten MCP server and install the Baseten skill to manage your workspace from your agent. Your agent can deploy and promote models, tune autoscaling, pull logs, and launch training jobs. Every tool is labeled read-only or mutating, so you control which calls change your account.

Get started by sending this prompt to your agent:

Install the Baseten agent toolkit following the instructions at github.com/basetenlabs/baseten-skills, all global (-g -y): the `baseten` skill, the backend MCP https://api.baseten.co/mcp with header "Authorization: Bearer $BASETEN_MCP_KEY", and the docs MCP https://docs.baseten.co/mcp. Run the commands in a shell where $BASETEN_MCP_KEY is set; don’t print the key. Then tell me how to verify and whether to restart.

For more information, see our docs.
Original source
Jun 29, 2026
Date parsed from source:
Jun 29, 2026

First seen by Releasebot:
Jul 7, 2026
Baseten

Configure scale-down rate

Baseten adds a new autoscaling control that caps how aggressively replicas are removed when traffic drops. With max_scale_down_rate, teams can scale down gradually to keep more capacity warm or release idle replicas faster.
You can now cap how aggressively the autoscaler removes replicas when traffic drops. Set max_scale_down_rate between 1% and 50% (default 50%) to limit the share of excess replicas removed at each scale-down step.

Lower the rate to scale down more gradually and keep more replicas warm when traffic tends to rebound. Raise it toward 50% to release idle capacity faster.

curl -X PATCH \ https://api.baseten.co/v1/models/$MODEL_ID/deployments/production/autoscaling_settings \ -H "Authorization: Api-Key $BASETEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "max_scale_down_rate": 20 }'

For more information, see our docs.
Original source
Jun 25, 2026
Date parsed from source:
Jun 25, 2026

First seen by Releasebot:
Jun 26, 2026
Baseten

Log downloads

Baseten adds dashboard log exports for deployments with background jobs and CSV or JSON downloads.

Download all of a deployment's logs for a chosen time range and filters straight from the dashboard. Baseten runs the export as a background job and returns a CSV or JSON file covering up to 7 days and 100,000 lines.

For more information, see our docs.
Original source