Together AI Release Notes
59 release notes curated from 1 source by the Releasebot Team. Last updated: Jun 18, 2026
- Jun 17, 2026
- Date parsed from source:Jun 17, 2026
- First seen by Releasebot:Jun 18, 2026
June 17, 2026
Together AI adds new serverless models, including zai-org/GLM-5.2 with long context, FP4 quantization, and function calling.
New serverless models
The following models are now available on serverless:
- zai-org/GLM-5.2: 262K context length, FP4 quantization. Pricing: $1.40 input / $4.40 output / $0.26 cached input (per 1M tokens). Supports function calling and structured outputs.
- Jun 16, 2026
- Date parsed from source:Jun 16, 2026
- First seen by Releasebot:Jun 17, 2026
June 16, 2026
Together AI improves the Python SDK with duplicate file upload errors and reuse-friendly file IDs.
Python SDK: duplicate file uploads now raise an error
client.files.upload() in the Python SDK now raises a ValueError when the file’s contents already exist on Together AI. The error message includes the ID of the existing file so you can reuse it without re-uploading.
To replace the file, delete the existing one first with
Original sourceclient.files.delete(<file-id>)and retry the upload. All of your release notes in one feed
Join Releasebot and get updates from Together AI and hundreds of other software products.
- Jun 13, 2026
- Date parsed from source:Jun 13, 2026
- First seen by Releasebot:Jun 16, 2026
June 13, 2026
Together AI adds serverless Kimi-K2.7-Code with 262K context, FP4, and function calling plus structured outputs.
New serverless models
The following models are now available on serverless:
- moonshotai/Kimi-K2.7-Code: 262,144 context length, FP4 quantization. Pricing: $0.95 input / $4.00 output / $0.19 cached input (per 1M tokens). Supports function calling and structured outputs.
- Jun 12, 2026
- Date parsed from source:Jun 12, 2026
- First seen by Releasebot:Jun 16, 2026
June 12, 2026
Together AI adds new serverless MiniMax-M3 model support with 524K context and FP4 quantization.
New serverless models
The following models are now available on serverless:
- MiniMaxAI/MiniMax-M3: 524,288 context length, FP4 quantization. Pricing: $0.30 input / $1.20 output / $0.06 cached input (per 1M tokens).
- Jun 11, 2026
- Date parsed from source:Jun 11, 2026
- First seen by Releasebot:Jun 12, 2026
June 11, 2026
Together AI deprecates mistralai/Voxtral-Mini-3B-2507 on serverless and points users to dedicated endpoints.
Deprecations
The following model has been deprecated and is no longer available on serverless:
mistralai/Voxtral-Mini-3B-2507. Available as an on-demand dedicated endpoint.See Deprecations for migration options.
Original source - Jun 9, 2026
- Date parsed from source:Jun 9, 2026
- First seen by Releasebot:Jun 10, 2026
- Modified by Releasebot:Jun 16, 2026
June 9, 2026
Together AI updates serverless model pricing with new cached input rates and lower DeepSeek-V4-Pro input and output prices.
Pricing update
The following changes are effective June 9, 2026:
New cached input pricing (per 1M tokens):
- zai-org/GLM-5.1: $0.26 cached input (81% discount from $1.40 standard input).
- Qwen/Qwen3.5-397B-A17B: $0.35 cached input (42% discount from $0.60 standard input).
Price decrease for deepseek-ai/DeepSeek-V4-Pro (per 1M tokens):
- Input: $2.10
- Output: $4.40
- Cached input: $0.20 (unchanged).
See Serverless models for the full pricing catalog.
Original source - Jun 8, 2026
- Date parsed from source:Jun 8, 2026
- First seen by Releasebot:Jun 10, 2026
June 8, 2026
Together AI adds server-side validation for fine-tuning datasets, giving uploaded files full schema checks during ingestion and clearer file status, validation reports, and user-facing errors to catch dataset issues before training starts.
Improvements
Server-side validation for fine-tuning datasets
Files uploaded for fine-tuning now go through full server-side schema validation during ingestion, with the result exposed on the file object. Poll the Files API and read processing_status (COMPLETED, INVALID_FORMAT, or FAILED) plus validation_report to detect dataset issues programmatically before launching a job, like missing role fields or malformed conversation turns.
Errors include a user-facing reason, so you can fix the dataset and re-upload without trial-and-error training runs. For example:
Original sourceLine 7: messages[1] must contain a role field - Jun 1, 2026
- Date parsed from source:Jun 1, 2026
- First seen by Releasebot:Jun 1, 2026
- Modified by Releasebot:Jun 2, 2026
June 1, 2026
Together AI adds a Fine-tuning job metrics API for programmatic progress tracking, brings Slurm startup scripts to GPU Clusters, and improves Evaluations with a single-pass compare mode. It also updates billing documentation across payment methods, invoices, ACH, auto-recharge, and prepaid access.
Fine-tuning job metrics API
A new API endpoint, GET /fine-tunes/{id}/metrics, returns training metrics for a fine-tuning job (e.g. loss curves and other per-step values) so you can monitor progress programmatically without opening the dashboard. See the API reference and Fine tuning training metrics for details.
Slurm startup scripts for GPU Clusters
GPU clusters now support Slurm startup scripts (lifecycle hook scripts that run at node startup, job allocation, and job completion). Use them to install packages at boot, configure SSH sessions, or run per-job prolog and epilog actions across worker, login, and controller nodes. See Slurm startup scripts for details.
Evaluations: Single-pass compare mode
The compare evaluator now accepts a disable_position_bias_correction parameter. By default, the judge runs each comparison twice (A→B then B→A) and reconciles verdicts to cancel position bias. Setting disable_position_bias_correction to true runs a single pass, cutting judge cost and latency in half. See AI evaluations for details.
Billing documentation updates
Updated billing docs for multiple payment methods, separate invoice addresses, ACH payment behavior, auto-recharge limits with bank transfers, and prepaid-only access (no negative balance limits). See Payment methods & invoices, Credits, and Billing troubleshooting.
Original source - May 29, 2026
- Date parsed from source:May 29, 2026
- First seen by Releasebot:May 30, 2026
May 29, 2026
Together AI updates serverless model pricing for Qwen and Meta Llama models, effective May 29, 2026.
Pricing update
The following models have updated pricing, effective May 29, 2026. All usage from that date forward will be billed at the new rates (per 1M tokens):
Qwen/Qwen3.5-9B: $0.10 10.17 (input), $0.15 10.25 (output).
meta-llama/Meta-Llama-3-8B-Instruct-Lite: $0.10 10.14 (input), $0.10 10.14 (output).
meta-llama/Llama-3.3-70B-Instruct-Turbo: $0.88 1.04 (input), $0.88 1.04 (output).
See Serverless models for the full pricing catalog.
Original source - May 25, 2026
- Date parsed from source:May 25, 2026
- First seen by Releasebot:May 30, 2026
May 25, 2026
Together AI adds new serverless image and video models, expands dedicated endpoint support with Gemma, Llama, Qwen and other variants, and now offers a Seedance 2.0 quickstart for multimodal audio-video generation workflows.
New serverless models
The following image and video models are now available on serverless:
Image
ByteDance/Seedream-5.0-lite
Video
alibaba/happyhorse-1.0-i2v (image-to-video)
alibaba/happyhorse-1.0-r2v (reference-to-video)
google/veo-3.1
google/veo-3.1-lite
New dedicated endpoint models
The following models are now available for deployment on dedicated endpoints:
Gemma 3 (1B, 27B, 27B LoRA)
Gemma 4 31B LoRA
MedGemma 27B
Molmo 7B
Llama 3.2 3B Instruct
Llama 4 Scout 17B FP8 LoRA
Qwen 2/2.5/3 variants (14B, 32B, 235B A22B Instruct 2507 FP8, Qwen2-72B)
Arcee Trinity Mini
BGE Base EN v1.5
MiniMax Speech 2.8 Turbo
Rime Mist v3 (text and omni)
Seedance 2.0 quickstart
A quickstart is now available for Seedance 2.0, ByteDance’s unified multimodal audio-video generation model. The guide covers text-to-video, image-to-video, video extension, and instruction-based editing.
Original source - May 27, 2026
- Date parsed from source:May 27, 2026
- First seen by Releasebot:May 27, 2026
May 27, 2026
Together AI deprecates black-forest-labs/FLUX.1-krea-dev on serverless and points users to migration options.
Deprecations
The following model has been deprecated and is no longer available on serverless:
black-forest-labs/FLUX.1-krea-dev.
See Deprecations for migration options.
Original source - May 22, 2026
- Date parsed from source:May 22, 2026
- First seen by Releasebot:May 22, 2026
May 22, 2026
Together AI adds external OIDC authentication and RBAC for GPU clusters, letting team members access Kubernetes APIs with their organization’s SSO. It replaces shared kubeconfig credentials with per-user tokens, audit trails, and easier revocation, with support for Kubernetes clusters only.
GPU Clusters: External OIDC authentication and RBAC
GPU clusters now support external OpenID Connect (OIDC) authentication, allowing each team member to access the cluster’s Kubernetes API using their organization’s identity provider — Google, Okta, Auth0, Microsoft Entra ID, and others.
With OIDC enabled, access is managed through standard Kubernetes RBAC: admins bind permissions to individual user identities, and each user authenticates via their browser using SSO. This replaces shared kubeconfig credentials with per-user tokens, per-user audit trails, and clean revocation. Currently this feature is only supported for Kubernetes clusters.
OIDC must be configured at cluster creation time. See Set up OIDC authentication for the full setup guide.
Original source - May 22, 2026
- Date parsed from source:May 22, 2026
- First seen by Releasebot:May 22, 2026
May 22, 2026
Together AI adds Qwen/Qwen3.7-Max to serverless models with new pricing for input and output tokens.
New serverless models
The following model has been added to serverless:
Qwen/Qwen3.7-Max. Pricing: $2.50 input / $7.50 output (per 1M tokens).
See Serverless models.
Original source - Jun 4, 2026
- Date parsed from source:Jun 4, 2026
- First seen by Releasebot:May 22, 2026
- Modified by Releasebot:Jun 16, 2026
June 4, 2026
Together AI deprecates Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 on serverless and points users to MiniMaxAI/MiniMax-M2.7.
Model deprecations
The following model has been deprecated and is no longer available on serverless:
Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8. Recommended replacement: MiniMaxAI/MiniMax-M2.7, available as an on-demand dedicated endpoint.
See Deprecations for migration options.
Original source - May 21, 2026
- Date parsed from source:May 21, 2026
- First seen by Releasebot:May 22, 2026
May 21, 2026
Together AI deprecates moonshotai/Kimi-K2.5 on serverless and points users to migration options.
Deprecations
The following model has been deprecated and is no longer available on serverless:
moonshotai/Kimi-K2.5.
See Deprecations for migration options.
Original source
Curated by the Releasebot team
Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.
Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.
Similar to Together AI with recent updates:
- xAI release notes89 release notes · Latest Jun 17, 2026
- Anthropic release notes627 release notes · Latest Jun 18, 2026
- OpenAI release notes753 release notes · Latest Jun 17, 2026
- Notion release notes135 release notes · Latest Jun 17, 2026
- Ubiquiti release notes715 release notes · Latest Jun 18, 2026
- Figma release notes125 release notes · Latest Jun 18, 2026