Together AI Release Notes
Last updated: Apr 16, 2026
- Apr 15, 2026
- Date parsed from source:Apr 15, 2026
- First seen by Releasebot:Apr 16, 2026
Pricing Update
Together AI updates google/gemma-3n-E4B-it pricing for April 15, 2026, raising input and output token rates.
The following model has updated pricing, effective April 15, 2026:
google/gemma-3n-E4B-it pricing updated
$0.02 → $0.06 (input), $0.04 → $0.12 (output) per 1M tokens
Original source - Apr 14, 2026
- Date parsed from source:Apr 14, 2026
- First seen by Releasebot:Apr 15, 2026
- Modified by Releasebot:Apr 16, 2026
Model Deprecations
Together AI deprecates several models and removes them from availability.
The following models have been deprecated and are no longer available:
- Qwen/Qwen3-VL-8B-Instruct
- Qwen/Qwen3-235B-A22B-Thinking-2507
- mistralai/Mixtral-8x7B-Instruct-v0.1
All of your release notes in one feed
Join Releasebot and get updates from Together AI and hundreds of other software products.
- Apr 11, 2026
- Date parsed from source:Apr 11, 2026
- First seen by Releasebot:Apr 16, 2026
Serverless Model Bring Ups
Together AI adds MiniMaxAI/MiniMax-M2.7 to its model lineup.
- Apr 11, 2026
- Date parsed from source:Apr 11, 2026
- First seen by Releasebot:Mar 20, 2026
- Modified by Releasebot:Apr 12, 2026
Apr 11
Together AI adds Serverless Model Bring Ups with MiniMaxAI/MiniMax-M2.7 now available.
Serverless Model Bring Ups
The following models have been added:
- MiniMaxAI/MiniMax-M2.7
- Apr 8, 2026
- Date parsed from source:Apr 8, 2026
- First seen by Releasebot:Apr 16, 2026
Serverless Model Bring Ups
Together AI adds google/gemma-4-31B-it and zai-org/GLM-5.1 models.
- Apr 8, 2026
- Date parsed from source:Apr 8, 2026
- First seen by Releasebot:Apr 9, 2026
Apr 8
Together AI adds serverless model bring ups for google/gemma-4-31B-it and zai-org/GLM-5.1.
Serverless Model Bring Ups
The following models have been added:
- google/gemma-4-31B-it
- zai-org/GLM-5.1
- Apr 2, 2026
- Date parsed from source:Apr 2, 2026
- First seen by Releasebot:Mar 31, 2026
- Modified by Releasebot:Apr 16, 2026
Model Deprecations
Together AI removes deprecated models from availability, including GLM, Mistral, and Qwen options.
The following models have been deprecated and are no longer available:
- zai-org/GLM-4.5-Air-FP8
- zai-org/GLM-4.7
- mistralai/Mistral-Small-24B-Instruct-2501
- Qwen/Qwen3-Next-80B-A3B-Instruct
- Mar 31, 2026
- Date parsed from source:Mar 31, 2026
- First seen by Releasebot:Mar 31, 2026
- Modified by Releasebot:Apr 16, 2026
Model Deprecation
Together AI deprecates meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 and removes it from availability.
The following model has been deprecated and is no longer available:
Original sourcemeta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 - Mar 10, 2026
- Date parsed from source:Mar 10, 2026
- First seen by Releasebot:Mar 20, 2026
Mar 10
Together AI adds cached input token pricing for MiniMaxAI/MiniMax-M2.5 at $0.06 per 1M tokens.
Cached Input Token Pricing
Cached input token pricing is now available:
- MiniMaxAI/MiniMax-M2.5: $0.06 per 1M cached input tokens (80% off standard input price)
- Mar 7, 2026
- Date parsed from source:Mar 7, 2026
- First seen by Releasebot:Mar 20, 2026
Mar 7
Together AI adds serverless model bring-ups with Qwen/Qwen3.5-9B available.
- Mar 6, 2026
- Date parsed from source:Mar 6, 2026
- First seen by Releasebot:Mar 20, 2026
Mar 6
Together AI deprecates several models and removes them from availability.
Model Deprecations
The following models have been deprecated and are no longer available:
- mixedbread-ai/Mxbai-Rerank-Large-V2
- moonshotai/Kimi-K2-Thinking
- meta-llama/Llama-3.2-3B-Instruct-Turbo
- moonshotai/Kimi-K2-Instruct-0905
- Feb 25, 2026
- Date parsed from source:Feb 25, 2026
- First seen by Releasebot:Mar 20, 2026
Feb 25
Together AI deprecates multiple models, including FLUX, Qwen, Llama, and Nemotron variants.
Model Deprecations
The following models have been deprecated and are no longer available:
- black-forest-labs/FLUX.1-dev
- black-forest-labs/FLUX.1-dev-lora
- black-forest-labs/FLUX.1-kontext-dev
- Qwen/Qwen3-VL-32B-Instruct
- mistralai/Ministral-3-14B-Instruct-2512
- Qwen/Qwen3-Next-80B-A3B-Thinking
- Alibaba-NLP/gte-modernbert-base
- BAAI/bge-base-en-v1.5
- meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
- meta-llama/Llama-Guard-3-11B-Vision-Turbo
- meta-llama/LlamaGuard-2-8b
- marin-community/marin-8b-instruct
- nvidia/NVIDIA-Nemotron-Nano-9B-v2
- Feb 16, 2026
- Date parsed from source:Feb 16, 2026
- First seen by Releasebot:Mar 20, 2026
Feb 16
Together AI adds serverless model bring ups for Qwen/Qwen3.5-397B-A17B.
Serverless Model Bring Ups
The following models have been added:
- Qwen/Qwen3.5-397B-A17B
- Feb 13, 2026
- Date parsed from source:Feb 13, 2026
- First seen by Releasebot:Mar 20, 2026
Feb 13
Together AI adds zai-org/GLM-5 to Serverless Model Bring Ups.
- Feb 12, 2026
- Date parsed from source:Feb 12, 2026
- First seen by Releasebot:Mar 20, 2026
Feb 12
Together AI launches Dedicated Container Inference to help users containerize, deploy, and scale custom models.
Dedicated Container Inference Launch
Together AI has officially launched Dedicated Container Inference (DCI), formerly known as BYOC.
DCI empowers users to containerize, deploy, and scale custom models on Together AI with ease.
- Blog post
- Documentation
- Getting started