Together AI Release Notes
Last updated: Mar 20, 2026
- Mar 10, 2026
- Date parsed from source:Mar 10, 2026
- First seen by Releasebot:Mar 20, 2026
Mar 10
Together AI adds cached input token pricing for MiniMaxAI/MiniMax-M2.5 at $0.06 per 1M tokens.
Cached Input Token Pricing
Cached input token pricing is now available:
- MiniMaxAI/MiniMax-M2.5: $0.06 per 1M cached input tokens (80% off standard input price)
- Mar 7, 2026
- Date parsed from source:Mar 7, 2026
- First seen by Releasebot:Mar 20, 2026
Mar 7
Together AI adds serverless model bring-ups with Qwen/Qwen3.5-9B available.
Serverless Model Bring Ups
The following models have been added:
Qwen/Qwen3.5-9B
Original source Report a problem All of your release notes in one feed
Join Releasebot and get updates from Together AI and hundreds of other software products.
- Mar 6, 2026
- Date parsed from source:Mar 6, 2026
- First seen by Releasebot:Mar 20, 2026
Mar 6
Together AI deprecates several models and removes them from availability.
Model Deprecations
The following models have been deprecated and are no longer available:
- mixedbread-ai/Mxbai-Rerank-Large-V2
- moonshotai/Kimi-K2-Thinking
- meta-llama/Llama-3.2-3B-Instruct-Turbo
- moonshotai/Kimi-K2-Instruct-0905
- Feb 25, 2026
- Date parsed from source:Feb 25, 2026
- First seen by Releasebot:Mar 20, 2026
Feb 25
Together AI deprecates multiple models, including FLUX, Qwen, Llama, and Nemotron variants.
Model Deprecations
The following models have been deprecated and are no longer available:
- black-forest-labs/FLUX.1-dev
- black-forest-labs/FLUX.1-dev-lora
- black-forest-labs/FLUX.1-kontext-dev
- Qwen/Qwen3-VL-32B-Instruct
- mistralai/Ministral-3-14B-Instruct-2512
- Qwen/Qwen3-Next-80B-A3B-Thinking
- Alibaba-NLP/gte-modernbert-base
- BAAI/bge-base-en-v1.5
- meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
- meta-llama/Llama-Guard-3-11B-Vision-Turbo
- meta-llama/LlamaGuard-2-8b
- marin-community/marin-8b-instruct
- nvidia/NVIDIA-Nemotron-Nano-9B-v2
- Feb 16, 2026
- Date parsed from source:Feb 16, 2026
- First seen by Releasebot:Mar 20, 2026
Feb 16
Together AI adds serverless model bring ups for Qwen/Qwen3.5-397B-A17B.
Serverless Model Bring Ups
The following models have been added:
- Qwen/Qwen3.5-397B-A17B
- Feb 15, 2026
- Date parsed from source:Feb 15, 2026
- First seen by Releasebot:Mar 20, 2026
Feb 15
Together AI adds serverless model bring-up support for MiniMaxAI/MiniMax-M2.5.
Serverless Model Bring Ups
The following models have been added:
- MiniMaxAI/MiniMax-M2.5
- Feb 13, 2026
- Date parsed from source:Feb 13, 2026
- First seen by Releasebot:Mar 20, 2026
Feb 13
Together AI adds zai-org/GLM-5 to Serverless Model Bring Ups.
Serverless Model Bring Ups
The following models have been added:
- zai-org/GLM-5
- Feb 12, 2026
- Date parsed from source:Feb 12, 2026
- First seen by Releasebot:Mar 20, 2026
Feb 12
Together AI launches Dedicated Container Inference to help users containerize, deploy, and scale custom models.
Dedicated Container Inference Launch
Together AI has officially launched Dedicated Container Inference (DCI), formerly known as BYOC.
DCI empowers users to containerize, deploy, and scale custom models on Together AI with ease.
- Blog post
- Documentation
- Getting started
- Feb 6, 2026
- Date parsed from source:Feb 6, 2026
- First seen by Releasebot:Mar 20, 2026
Feb 6
Together AI deprecates several models, including Llama, Qwen, and BGE variants, which are no longer available.
Model Deprecations
The following models have been deprecated and are no longer available:
- togethercomputer/m2-bert-80M-32k-retrieval
- Salesforce/Llama-Rank-V1
- togethercomputer/Refuel-Llm-V2
- togethercomputer/Refuel-Llm-V2-Small
- Qwen/Qwen3-235B-A22B-fp8-tput
- qwen-qwen2-5-14b-instruct-lora
- meta-llama/Llama-4-Scout-17B-16E-Instruct
- Qwen/Qwen2.5-72B-Instruct-Turbo
- meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
- BAAI/bge-large-en-v1.5
- Feb 4, 2026
- Date parsed from source:Feb 4, 2026
- First seen by Releasebot:Mar 20, 2026
Feb 4
Together AI releases Python SDK v2.0 General Availability, bringing a faster, type-safe OpenAPI-driven client that is easier to maintain and aligned with the latest API surface. It also includes beta APIs for Instant Clusters and becomes the new home for future features.
Python SDK v2.0 General Availability
Together AI is releasing the Python SDK v2.0 — a new, type-safe, OpenAPI-driven client designed to be faster, easier to maintain, and ready for everything we’re building next.
Install:
pip install togetheror
uv add togetherMigration Guide: A detailed Python SDK Migration Guide covers API-by-API changes, type updates, and troubleshooting tips
Code and Docs: Access the Together Python v2 repo and reference docs with code examples
Main Goal: Replace the legacy v1 Python SDK with a modern, strongly-typed, OpenAPI-generated client that matches the API surface more closely and stays in lock-step with new features
Net New: All new features will be built in version 2 moving forward. This first version already includes beta APIs for our Instant Clusters!