Baseten Release Notes

Last updated: Apr 16, 2026

Get this feed:

Apr 16, 2026
- Date parsed from source:
  Apr 16, 2026
- First seen by Releasebot:
  Apr 16, 2026
Baseten

Cache token pricing now available for Model APIs

Baseten introduces discounted cached input token pricing on Model APIs for most models, with cache hits automatically billed at lower rates, cached token counts shown in API responses, and pricing visible in-app and on the website.

Cached input tokens are billed at a discounted rate on Model APIs for all models (excluding GPT-OSS), starting April 17, 2026. Cache token pricing is applied automatically to the portion of each request that hits the KV cache. The number of cached tokens will be visible in the API response for every request.

We’re introducing cache token pricing to better fit the agentic workloads we serve on Model APIs. With that in mind, your bill should decrease in proportion to your cache hit rate. Cache token pricing is visible in-app and on our website. For those with high cache hit rates, you should expect substantial savings!

For more information, see our pricing table or read our docs.
Original source
Apr 6, 2026
- Date parsed from source:
  Apr 6, 2026
- First seen by Releasebot:
  Apr 6, 2026
Baseten

Named entity recognition on BEI-Bert

Baseten adds token-classification support for BEI-Bert with /predict_tokens NER and low-latency inference.

BEI-Bert now supports token-classification models for named-entity recognition. Deploy any ForTokenClassification model with the /predict_tokens endpoint and get structured entity predictions with configurable aggregation strategies. NER on BEI-Bert runs with sub-three millisecond client-side latency on L4 GPUs.

For more information, see our blog.
Original source
All of your release notes in one feed

Join Releasebot and get updates from Baseten and hundreds of other software products.

Create account
Get updates with:
Apr 6, 2026
- Date parsed from source:
  Apr 6, 2026
- First seen by Releasebot:
  Apr 6, 2026
Baseten

Copy and download logs

Baseten adds log export tools with copy, CSV, JSON, and range selection in the logs viewer.

You can now copy or download all visible logs directly from the logs viewer. A new export menu next to the search box lets you copy logs to your clipboard, or download them as CSV or JSON. To export a subset, shift-click to select a range of logs and copy just those lines.
Original source
Apr 1, 2026
- Date parsed from source:
  Apr 1, 2026
- First seen by Releasebot:
  Apr 1, 2026
Baseten

Per-request log filtering

Baseten adds unique request IDs to predict responses for easier per-call log filtering across HTTP, gRPC, and async requests.

Every predict call now returns a unique request ID in the X-Baseten-Request-Id response header. Use this ID to filter your model's logs to a single request, cutting through the noise when debugging individual predictions in production. Works across HTTP, gRPC, and async predict calls. Requires Truss 0.15.5 or later.

For more information, see Logs.
Original source
Mar 31, 2026
- Date parsed from source:
  Mar 31, 2026
- First seen by Releasebot:
  Apr 1, 2026
Baseten

Health check improvements

Baseten improves startup probes by waiting for model load before liveness checks and supports configurable startup thresholds.

Startup probes now handle initialization more reliably by waiting until the model has loaded before executing any liveness checks. The startup phase still defaults to 30 minutes and can be configured up to 50 minutes through the startup_threshold_seconds parameter.

For more information, see Custom health checks.
Original source
Mar 30, 2026
- Date parsed from source:
  Mar 30, 2026
- First seen by Releasebot:
  Mar 31, 2026
Baseten

Rolling deployments

Baseten adds rolling deployments for zero-downtime updates, gradually shifting traffic to new deployments with controlled replica scaling. Users can pause, resume, or cancel rollouts and tune rollout speed and cleanup behavior in the dashboard or API.
You can now gradually shift traffic to new deployments instead of swapping all at once. Candidate replicas scale up incrementally while previous replicas scale down in controlled steps, giving you zero-downtime updates. Pause, resume, or cancel mid-rollout if you spot issues. Configure rollout speed with max_surge_percent and stabilization_time_seconds, and choose how to handle the previous deployment after promotion.

Enable rolling deployments in your environment's promotion settings in the dashboard, or through the API:

curl -X PATCH "https://api.baseten.co/v1/models/{model_id}/environments/{env_name}" \ -H "Authorization: Api-Key $BASETEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "promotion_settings": { "rolling_deploy": true, "promotion_cleanup_strategy": "SCALE_TO_ZERO", "rolling_deploy_config": { "max_surge_percent": 10, "stabilization_time_seconds": 300 } } }'

For more information, see Rolling deployments.
Original source
Mar 27, 2026
- Date parsed from source:
  Mar 27, 2026
- First seen by Releasebot:
  Mar 27, 2026
Baseten

Terminate deployment replica via API

Baseten adds a management API to terminate specific deployment replicas without affecting the rest of the deployment.
You can now terminate a specific replica within a deployment using the new management API endpoint. This lets you remove individual replicas without affecting the rest of the deployment, making it useful for evicting stuck or unhealthy replicas.

curl --request DELETE \ --url https://api.baseten.co/v1/models/{model_id}/deployments/{deployment_id}/replicas/{replica_id} \ --header "Authorization: Api-Key $BASETEN_API_KEY "

For more information, see the terminate deployment replica docs.
Original source
Mar 27, 2026
- Date parsed from source:
  Mar 27, 2026
- First seen by Releasebot:
  Mar 27, 2026
Baseten

Hot reload for development deployments

Baseten adds hot-reloading for truss watch and truss push --watch, speeding model code iteration without restarting inference servers.

truss watch and truss push --watch now support hot-reloading model code changes with the --hot-reload and --watch-hot-reload flags. Instead of restarting the inference server, hot reload swaps your model class in-process: keeping weights and caches loaded for near-instant iteration on predict() logic.

For more information, see Deploy and iterate.
Original source
Mar 26, 2026
- Date parsed from source:
  Mar 26, 2026
- First seen by Releasebot:
  Mar 27, 2026
Baseten

Observability Improvements

Baseten redesigns logs and metrics views for faster debugging, adding an interactive log volume chart, real-time time-range filtering, default full-width layouts, and a new two-column metrics grid with saved view preferences.

We've redesigned the logs and metrics views for better visibility and faster debugging.

Logs

Logs now include an interactive volume chart that visualizes log frequency over time. Click a bar to zoom into that time window, or drag across a range to filter: the log viewer updates in real time. The logs view is now full-width by default, replacing the previous fullscreen toggle.

Metrics

Metrics also get the full-width treatment, along with a new grid view that displays charts in a two-column layout so you can see more metrics at once. A toggle at the top of the page lets you switch between grid and list views, and your preference is saved across sessions.
Original source
Mar 23, 2026
- Date parsed from source:
  Mar 23, 2026
- First seen by Releasebot:
  Mar 24, 2026
Baseten

Model API Deprecation (Kimi K2 0905, Kimi K2 Thinking, DeepSeek v3.2)

Baseten deprecates the Kimi K2 0905, Kimi K2 Thinking, and DeepSeek v3.2 Model APIs and points users to Kimi K2.5 as the recommended replacement for continued model access and tool calling.

The Kimi K2 0905, Kimi K2 Thinking, and DeepSeek v3.2 Model API(s) were deprecated at 5pm PT March 6th. The model ID is currently inactive and will return an error for all requests.

As open source models advance rapidly, we prioritize serving the highest quality models and deprecate models when stronger alternatives are available.

We recommend Kimi K2.5 as an alternative. Kimi K2.5 offers very strong agentic coding and tool calling capabilities. Just swap in the new Model ID moonshotai/Kimi-K2.5

If you’d like to continue using the previous weights, please contact us about a dedicated deployment of the model.
Original source
Mar 19, 2026
- Date parsed from source:
  Mar 19, 2026
- First seen by Releasebot:
  Mar 20, 2026
Baseten

Introducing the Baseten Delivery Network (BDN)

Baseten launches the Baseten Delivery Network, making cold starts 2-3x faster for large models with smarter weight delivery, multi-tier caching, and fewer upstream dependencies.

We just launched the Baseten Delivery Network (BDN), designed to make cold starts 2-3x faster for large models.

BDN solves three root causes of slow cold starts: slow weight pulls from upstream storage, replica stampedes under load, and upstream availability dependencies. On first deployment, BDN mirrors your weights to secure storage. From there, a multi-tier cache (node cluster mirrored origin) serves weights with consistent hashing and single-flight semantics: each file fetched once per cluster, not once per pod. Fine-tunes sharing weights with a base model only pull the delta.

Check out the launch blog to learn more, or see the docs to get started.
Original source
Mar 16, 2026
- Date parsed from source:
  Mar 16, 2026
- First seen by Releasebot:
  Mar 20, 2026
Baseten

Regional environments

Baseten adds regional environments to keep inference traffic in-region for data residency and GDPR compliance.
Route inference traffic exclusively within a designated geographic region to meet data residency and compliance requirements like GDPR.

Regional environments use a dedicated endpoint format that guarantees traffic stays in-region:

https://model-{model_id}-{env_name}.api.baseten.co/predict

Contact [email protected] to set up regional environments. For more information, see Regional environments.
Original source
Mar 13, 2026
- Date parsed from source:
  Mar 13, 2026
- First seen by Releasebot:
  Mar 20, 2026
Baseten

CI/CD for model deployments

Baseten adds the Truss Push Action to automate deployments, validate pull requests, and stream deploy logs in GitHub Actions.

Automate Truss deployments with the Truss Push Action. Deploy on merge, validate on pull request, or deploy multiple models in parallel.

The action streams deployment logs directly into GitHub Actions, validates models, and writes a summary of deploy time metrics.

To get started, see the CI/CD docs.
Original source
Mar 7, 2026
- Date parsed from source:
  Mar 7, 2026
- First seen by Releasebot:
  Mar 20, 2026
Baseten

Truss 0.15.2

Baseten adds a --no-cache flag to truss push for full rebuilds without cached Docker layers.

Added --no-cache flag to truss push to force a full rebuild without using cached Docker layers. This is useful when debugging build issues or ensuring a clean image. The flag is CLI-only and cannot be set in config.yaml.

For more information, see Truss or the documentation.
Original source
Mar 6, 2026
- Date parsed from source:
  Mar 6, 2026
- First seen by Releasebot:
  Mar 20, 2026
Baseten

Environment-scoped API keys

Baseten adds API key restrictions for specific environments and models to tighten team access control.

You can now restrict API keys to specific environments and models, giving you more control over how your team accesses Baseten resources.

When creating a team key with Manage permissions, use the new Environment access dropdown to limit which environments the key can reach. This works for both "call all team models" and "call certain models" permission levels.

To enable this feature for your workspace, reach out to [email protected].

For more information, see the API keys documentation.
Original source

Baseten Release Notes

Cache token pricing now available for Model APIs

Named entity recognition on BEI-Bert

Copy and download logs

Per-request log filtering

Health check improvements

Rolling deployments

Terminate deployment replica via API

Hot reload for development deployments

Observability Improvements

Logs

Metrics

Model API Deprecation (Kimi K2 0905, Kimi K2 Thinking, DeepSeek v3.2)

Introducing the Baseten Delivery Network (BDN)

Regional environments

CI/CD for model deployments

Truss 0.15.2

Environment-scoped API keys

Related vendors