Baseten Release Notes
Last updated: Mar 20, 2026
- Mar 19, 2026
- Date parsed from source:Mar 19, 2026
- First seen by Releasebot:Mar 20, 2026
Introducing the Baseten Delivery Network (BDN)
Baseten launches the Baseten Delivery Network, making cold starts 2-3x faster for large models with smarter weight delivery, multi-tier caching, and fewer upstream dependencies.
We just launched the Baseten Delivery Network (BDN), designed to make cold starts 2-3x faster for large models.
BDN solves three root causes of slow cold starts: slow weight pulls from upstream storage, replica stampedes under load, and upstream availability dependencies. On first deployment, BDN mirrors your weights to secure storage. From there, a multi-tier cache (node cluster mirrored origin) serves weights with consistent hashing and single-flight semantics: each file fetched once per cluster, not once per pod. Fine-tunes sharing weights with a base model only pull the delta.
Check out the launch blog to learn more, or see the docs to get started.
Original source Report a problem - Mar 16, 2026
- Date parsed from source:Mar 16, 2026
- First seen by Releasebot:Mar 20, 2026
Regional environments
Baseten adds regional environments to keep inference traffic in-region for data residency and GDPR compliance.
Route inference traffic exclusively within a designated geographic region to meet data residency and compliance requirements like GDPR.
Regional environments use a dedicated endpoint format that guarantees traffic stays in-region:
https://model-{model_id}-{env_name}.api.baseten.co/predictContact [email protected] to set up regional environments. For more information, see Regional environments.
Original source Report a problem All of your release notes in one feed
Join Releasebot and get updates from Baseten and hundreds of other software products.
- Mar 13, 2026
- Date parsed from source:Mar 13, 2026
- First seen by Releasebot:Mar 20, 2026
CI/CD for model deployments
Baseten adds the Truss Push Action to automate deployments, validate pull requests, and stream deploy logs in GitHub Actions.
Automate Truss deployments with the Truss Push Action. Deploy on merge, validate on pull request, or deploy multiple models in parallel.
The action streams deployment logs directly into GitHub Actions, validates models, and writes a summary of deploy time metrics.
To get started, see the CI/CD docs.
Original source Report a problem - Mar 7, 2026
- Date parsed from source:Mar 7, 2026
- First seen by Releasebot:Mar 20, 2026
Truss 0.15.2
Baseten adds a --no-cache flag to truss push for full rebuilds without cached Docker layers.
Added
--no-cacheflag totruss pushto force a full rebuild without using cached Docker layers. This is useful when debugging build issues or ensuring a clean image. The flag is CLI-only and cannot be set in config.yaml.For more information, see Truss or the documentation.
Original source Report a problem - Mar 6, 2026
- Date parsed from source:Mar 6, 2026
- First seen by Releasebot:Mar 20, 2026
Environment-scoped API keys
Baseten adds API key restrictions for specific environments and models to tighten team access control.
You can now restrict API keys to specific environments and models, giving you more control over how your team accesses Baseten resources.
When creating a team key with Manage permissions, use the new Environment access dropdown to limit which environments the key can reach. This works for both "call all team models" and "call certain models" permission levels.
To enable this feature for your workspace, reach out to [email protected].
For more information, see the API keys documentation.
Original source Report a problem - Mar 4, 2026
- Date parsed from source:Mar 4, 2026
- First seen by Releasebot:Mar 20, 2026
Retrieve billing usage via API
Baseten adds a new billing usage summary API that lets users query cost breakdowns programmatically across Dedicated Inference, Training, and Model APIs. It includes aggregate totals, daily granularity, and historical attribution for deleted resources.
You can now query your billing usage programmatically using the new
GET /v1/billing/usage_summaryendpoint. Pass a date range of up to 31 days to get a breakdown of costs across Dedicated Inference, Training, and Model APIs.The response includes aggregate totals and a per-resource or per-model
breakdown[]array, with daily granularity on each entry. Deleted resources are still returned inbreakdown[]withis_deleted: true, so historical cost attribution is preserved.Example output:
{ "dedicated_usage": { "subtotal": 123, "credits_used": 123, "total": 123, "minutes": 123, "breakdown": [{ "billable_resource": { "id": "<string>", "kind": "MODEL_DEPLOYMENT", "name": "<string>", "is_deleted": true, "instance_type": "<string>", "environment_name": "<string>" }, "subtotal": 123, "minutes": 123, "inference_requests": 123, "daily": [{ "date": "2023-12-25", "subtotal": 123, "minutes": 123, "inference_requests": 123 }] }] }, "training_usage": { "subtotal": 123, "credits_used": 123, "total": 123, "minutes": 123, "breakdown": [{ ... }] }, "model_apis_usage": { "subtotal": 123, "credits_used": 123, "total": 123, "breakdown": [{ "model_name": "<string>", "model_family": "<string>", "subtotal": 123, "input_tokens": 123, "output_tokens": 123, "cached_input_tokens": 123, "daily": [{ "date": "2023-12-25", "subtotal": 123, "input_tokens": 123, "output_tokens": 123 }] }] } }Check out the billing API reference for the full schema and parameters.
Original source Report a problem - Mar 4, 2026
- Date parsed from source:Mar 4, 2026
- First seen by Releasebot:Mar 20, 2026
Truss support for pyproject.toml and uv.lock
Baseten now supports pyproject.toml and uv.lock as dependency formats in Truss and Chains configs.
Truss now supports
pyproject.tomlanduv.lockas dependency formats in addition torequirements.txt. You can use any of these formats as therequirements_filein your Truss and Chains config. For example:model_name: My Model resources: accelerator: A10G cpu: "4" memory: 16Gi requirements_file: ./pyproject.tomlFor more information, see the Truss configuration documentation.
Original source Report a problem - Mar 2, 2026
- Date parsed from source:Mar 2, 2026
- First seen by Releasebot:Mar 20, 2026
Deployment labels on push
Baseten adds deployment labels at push time for easier search and filtering in the UI and API.
You can now attach labels to deployments at push time using the
--labelsflag. Labels are key-value pairs passed as a JSON string that are stored with the deployment.Sample usage
truss push --labels '{ "env": "staging", "team": "ml-platform", "version": "1.2.0" }'Labeled deployments can be searched and filtered in the Baseten UI and API. Check out our CLI docs for the full list of flags.
Original source Report a problem - Feb 26, 2026
- Date parsed from source:Feb 26, 2026
- First seen by Releasebot:Mar 20, 2026
Truss upgrades and rollbacks
Baseten's Truss adds CLI self-upgrades with a new truss upgrade command, package manager detection, and support for version-specific upgrades or rollbacks. It also now alerts users when a new version is available during normal commands like truss push.
Truss can now upgrade itself directly from the CLI. Use the new
truss upgradecommand to update to the latest version. Truss will detect your package manager (supports uv, pip, pipx, and anaconda) and ask for confirmation before proceeding.You can also upgrade or roll back to a specific version by passing it as an argument:
truss upgrade 0.14.0.Truss will also now notify you when a new version is available, displayed right at the start of normal commands like
truss push.Version checks run once daily and can be disabled by setting
Original source Report a problemcheck_for_updates = falseunder[preferences]in$HOME/.config/truss/settings.toml. - Feb 25, 2026
- Date parsed from source:Feb 25, 2026
- First seen by Releasebot:Mar 20, 2026
Monitor concurrent inference requests
Baseten adds a Concurrent Requests graph to track in-progress inference requests and autoscaling signals across deployments.
Track the number of in-progress inference requests across your deployments, including both requests currently being serviced and those waiting in the queue. This is the key indicator used to drive autoscaling decisions, and is now visible in the metrics dashboard and available through metrics export. For more information, see the supported metrics docs and the autoscaling documentation.
Concurrent Requests Graph
Original source Report a problem