Baseten Release Notes

Last updated: Apr 6, 2026

  • Apr 6, 2026
    • Date parsed from source:
      Apr 6, 2026
    • First seen by Releasebot:
      Apr 6, 2026
    Baseten logo

    Baseten

    Named entity recognition on BEI-Bert

    Baseten adds token-classification support for BEI-Bert with /predict_tokens NER and low-latency inference.

    BEI-Bert now supports token-classification models for named-entity recognition. Deploy any ForTokenClassification model with the /predict_tokens endpoint and get structured entity predictions with configurable aggregation strategies. NER on BEI-Bert runs with sub-three millisecond client-side latency on L4 GPUs.

    For more information, see our blog.

    Original source Report a problem
  • Apr 6, 2026
    • Date parsed from source:
      Apr 6, 2026
    • First seen by Releasebot:
      Apr 6, 2026
    Baseten logo

    Baseten

    Copy and download logs

    Baseten adds log export tools with copy, CSV, JSON, and range selection in the logs viewer.

    You can now copy or download all visible logs directly from the logs viewer. A new export menu next to the search box lets you copy logs to your clipboard, or download them as CSV or JSON. To export a subset, shift-click to select a range of logs and copy just those lines.

    Original source Report a problem
  • All of your release notes in one feed

    Join Releasebot and get updates from Baseten and hundreds of other software products.

  • Apr 1, 2026
    • Date parsed from source:
      Apr 1, 2026
    • First seen by Releasebot:
      Apr 1, 2026
    Baseten logo

    Baseten

    Per-request log filtering

    Baseten adds unique request IDs to predict responses for easier per-call log filtering across HTTP, gRPC, and async requests.

    Every predict call now returns a unique request ID in the X-Baseten-Request-Id response header. Use this ID to filter your model's logs to a single request, cutting through the noise when debugging individual predictions in production. Works across HTTP, gRPC, and async predict calls. Requires Truss 0.15.5 or later.

    For more information, see Logs.

    Original source Report a problem
  • Mar 31, 2026
    • Date parsed from source:
      Mar 31, 2026
    • First seen by Releasebot:
      Apr 1, 2026
    Baseten logo

    Baseten

    Health check improvements

    Baseten improves startup probes by waiting for model load before liveness checks and supports configurable startup thresholds.

    Startup probes now handle initialization more reliably by waiting until the model has loaded before executing any liveness checks. The startup phase still defaults to 30 minutes and can be configured up to 50 minutes through the startup_threshold_seconds parameter.

    For more information, see Custom health checks.

    Original source Report a problem
  • Mar 30, 2026
    • Date parsed from source:
      Mar 30, 2026
    • First seen by Releasebot:
      Mar 31, 2026
    Baseten logo

    Baseten

    Rolling deployments

    Baseten adds rolling deployments for zero-downtime updates, gradually shifting traffic to new deployments with controlled replica scaling. Users can pause, resume, or cancel rollouts and tune rollout speed and cleanup behavior in the dashboard or API.

    You can now gradually shift traffic to new deployments instead of swapping all at once. Candidate replicas scale up incrementally while previous replicas scale down in controlled steps, giving you zero-downtime updates. Pause, resume, or cancel mid-rollout if you spot issues. Configure rollout speed with max_surge_percent and stabilization_time_seconds, and choose how to handle the previous deployment after promotion.

    Enable rolling deployments in your environment's promotion settings in the dashboard, or through the API:

    curl -X PATCH "https://api.baseten.co/v1/models/{model_id}/environments/{env_name}" \
    -H "Authorization: Api-Key $BASETEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "promotion_settings": {
        "rolling_deploy": true,
        "promotion_cleanup_strategy": "SCALE_TO_ZERO",
        "rolling_deploy_config": {
          "max_surge_percent": 10,
          "stabilization_time_seconds": 300
        }
      }
    }'
    

    For more information, see Rolling deployments.

    Original source Report a problem
  • Mar 27, 2026
    • Date parsed from source:
      Mar 27, 2026
    • First seen by Releasebot:
      Mar 27, 2026
    Baseten logo

    Baseten

    Terminate deployment replica via API

    Baseten adds a management API to terminate specific deployment replicas without affecting the rest of the deployment.

    You can now terminate a specific replica within a deployment using the new management API endpoint. This lets you remove individual replicas without affecting the rest of the deployment, making it useful for evicting stuck or unhealthy replicas.

    curl --request DELETE \
    --url https://api.baseten.co/v1/models/{model_id}/deployments/{deployment_id}/replicas/{replica_id} \
    --header
    "Authorization: Api-Key
    $BASETEN_API_KEY
    "
    

    For more information, see the terminate deployment replica docs.

    Original source Report a problem
  • Mar 27, 2026
    • Date parsed from source:
      Mar 27, 2026
    • First seen by Releasebot:
      Mar 27, 2026
    Baseten logo

    Baseten

    Hot reload for development deployments

    Baseten adds hot-reloading for truss watch and truss push --watch, speeding model code iteration without restarting inference servers.

    truss watch and truss push --watch now support hot-reloading model code changes with the --hot-reload and --watch-hot-reload flags. Instead of restarting the inference server, hot reload swaps your model class in-process: keeping weights and caches loaded for near-instant iteration on predict() logic.

    For more information, see Deploy and iterate.

    Original source Report a problem
  • Mar 26, 2026
    • Date parsed from source:
      Mar 26, 2026
    • First seen by Releasebot:
      Mar 27, 2026
    Baseten logo

    Baseten

    Observability Improvements

    Baseten redesigns logs and metrics views for faster debugging, adding an interactive log volume chart, real-time time-range filtering, default full-width layouts, and a new two-column metrics grid with saved view preferences.

    We've redesigned the logs and metrics views for better visibility and faster debugging.

    Logs

    Logs now include an interactive volume chart that visualizes log frequency over time. Click a bar to zoom into that time window, or drag across a range to filter: the log viewer updates in real time. The logs view is now full-width by default, replacing the previous fullscreen toggle.

    Metrics

    Metrics also get the full-width treatment, along with a new grid view that displays charts in a two-column layout so you can see more metrics at once. A toggle at the top of the page lets you switch between grid and list views, and your preference is saved across sessions.

    Original source Report a problem
  • Mar 23, 2026
    • Date parsed from source:
      Mar 23, 2026
    • First seen by Releasebot:
      Mar 24, 2026
    Baseten logo

    Baseten

    Model API Deprecation (Kimi K2 0905, Kimi K2 Thinking, DeepSeek v3.2)

    Baseten deprecates the Kimi K2 0905, Kimi K2 Thinking, and DeepSeek v3.2 Model APIs and points users to Kimi K2.5 as the recommended replacement for continued model access and tool calling.

    The Kimi K2 0905, Kimi K2 Thinking, and DeepSeek v3.2 Model API(s) were deprecated at 5pm PT March 6th. The model ID is currently inactive and will return an error for all requests.

    As open source models advance rapidly, we prioritize serving the highest quality models and deprecate models when stronger alternatives are available.

    We recommend Kimi K2.5 as an alternative. Kimi K2.5 offers very strong agentic coding and tool calling capabilities. Just swap in the new Model ID moonshotai/Kimi-K2.5

    If you’d like to continue using the previous weights, please contact us about a dedicated deployment of the model.

    Original source Report a problem
  • Mar 19, 2026
    • Date parsed from source:
      Mar 19, 2026
    • First seen by Releasebot:
      Mar 20, 2026
    Baseten logo

    Baseten

    Introducing the Baseten Delivery Network (BDN)

    Baseten launches the Baseten Delivery Network, making cold starts 2-3x faster for large models with smarter weight delivery, multi-tier caching, and fewer upstream dependencies.

    We just launched the Baseten Delivery Network (BDN), designed to make cold starts 2-3x faster for large models.

    BDN solves three root causes of slow cold starts: slow weight pulls from upstream storage, replica stampedes under load, and upstream availability dependencies. On first deployment, BDN mirrors your weights to secure storage. From there, a multi-tier cache (node cluster mirrored origin) serves weights with consistent hashing and single-flight semantics: each file fetched once per cluster, not once per pod. Fine-tunes sharing weights with a base model only pull the delta.

    Check out the launch blog to learn more, or see the docs to get started.

    Original source Report a problem

Related vendors