Fireworks AI Release Notes
Last updated: Mar 20, 2026
- Feb 5, 2026
- Date parsed from source:Feb 5, 2026
- First seen by Releasebot:Mar 20, 2026
2026-02-05
Fireworks AI adds multimodal video and audio inputs, secure AWS S3 dataset storage for training, and just-in-time SSO user provisioning for Enterprise. It also expands docs, autoscaling guidance, and Responses API and model management support, plus bug fixes and minor improvements.
Video & Audio Input Models
You can now query multimodal models with video and audio inputs for video captioning, scene analysis, and multimodal question answering. Deploy models like Qwen3 Omni and Molmo2 to process video and audio content directly using the Chat Completions API.
See the Video & Audio Inputs guide for deployment instructions and code examples.
AWS S3 Integration for Training Datasets
Training datasets can now be stored in your own AWS S3 buckets using GCP-to-AWS OIDC federation. This Bring Your Own Bucket (BYOB) approach keeps your data private while enabling secure access during Supervised Fine-Tuning and Reinforcement Fine-Tuning jobs—no long-lived credentials required.
See the Secure Training (BYOB) documentation for IAM role setup and usage examples.
Just-In-Time (JIT) User Provisioning for SSO (Enterprise)
JIT user provisioning automatically creates user accounts when users sign in through SSO for the first time. Enable this when configuring your identity provider to eliminate manual user creation.
See the SSO documentation for setup instructions.
📚 Documentation Updates
- Video & Audio Inputs: New guide for processing video and audio with Qwen3 Omni and Molmo2 models (Video & Audio Inputs)
- AWS S3 Bucket Integration: BYOB dataset storage for training via OIDC federation (Secure Training)
- Rate Limits Clarification: Expanded documentation on soft limits, dynamic growth mechanics, and monitoring (Rate Limits)
- Anthropic-Compatible Thinking Parameter: Control reasoning with thinking parameter alongside reasoning_effort (Reasoning)
- Scaling from Zero Behavior: Deployments scaled to zero return 503 immediately with retry guidance (Autoscaling)
- Custom Model Configuration: Generation defaults via fireworks.json for LoRA adapters (Uploading Custom Models)
- Service Account Roles: Role assignment and management for service accounts (Service Accounts)
- Free Tuning: Documentation highlighting free fine-tuning options across fine-tuning guides
- Serverless Model Retrieval: Python SDK and curl examples for programmatically listing serverless models (FAQ)
- Function Tools in Responses API: Client-executed function tools following OpenAI-compatible format (Responses API)
- Autoscaling Metric: New prompt_tokens_per_second load target for prefill-heavy workloads (Autoscaling)
- firectl CLI Syntax: All documentation updated to firectl syntax
Bug Fixes & Minor Improvements
Original source Report a problem - Jan 20, 2026
- Date parsed from source:Jan 20, 2026
- First seen by Releasebot:Mar 20, 2026
2026-01-20
Fireworks AI adds warm-start training for reinforcement fine-tuning and Azure federated identity support for model uploads, along with documentation updates and minor bug fixes.
Warm-Start Training and Azure Model Uploads
Warm-Start Training for Reinforcement Fine-Tuning
You can now warm-start Reinforcement Fine-Tuning jobs from previously supervised fine-tuned checkpoints using the --warm-start-from flag. This enables a streamlined SFT-to-RFT workflow where you first train a model with supervised fine-tuning, then continue training with reinforcement learning.
See the Warm-Start Training guide for details.
Azure Federated Identity for Model Uploads
Model uploads from Azure Blob Storage now support Azure AD federated identity authentication as an alternative to SAS tokens. This eliminates the need for credential rotation and enables secure, credential-less authentication.
See the Uploading Custom Models documentation for setup instructions.
📚 Documentation Updates
- Warm-Start Training: New guide for SFT-to-RFT workflows (Warm-Start Training)
- Azure Federated Identity: Setup instructions for Azure AD authentication (Uploading Custom Models)
- Preserved Thinking: Multi-turn reasoning with preserved thinking context (Reasoning)
- GLM 4.7: Added to models supporting reasoning_effort parameter
Bug Fixes & Minor Improvements
Original source Report a problem All of your release notes in one feed
Join Releasebot and get updates from Fireworks AI and hundreds of other software products.
- Dec 22, 2025
- Date parsed from source:Dec 22, 2025
- First seen by Releasebot:Mar 20, 2026
2025-12-22
Fireworks AI adds Playground category tabs, new Contributor and Inference roles, and major fine-tuning upgrades like stop and resume, job cloning, and log and dataset downloads. It also expands the Model Library with Gemma 3 and Qwen3 Omni models.
Playground Categories, New User Roles, Fine-Tuning Improvements, and New Models
Playground Categories
The Playground now features category tabs (LLM, Image, TTS, STT) in the header for easier switching between model types. The playground automatically detects the appropriate category based on the selected model and provides smart defaults for each category.
User Roles: Contributor and Inference
New user roles provide more granular access control for team collaboration:
- Contributor: Read and write access to resources without administrative privileges
- Inference: Read-only access with the ability to run inference on deployments
Assign these roles when inviting team members to provide appropriate access levels.
Fine-Tuning Improvements
Fine-tuning workflows have been enhanced with several new capabilities:
- Stop and Resume Jobs: Stop running fine-tuning jobs and resume them later from where they left off. Available for Supervised Fine-Tuning and Reinforcement Fine-Tuning jobs.
- Clone Jobs: Quickly create new fine-tuning jobs based on existing job configurations using the Clone action.
- Download Output Datasets: Download output datasets from Reinforcement Fine-Tuning jobs, including individual files or bulk download as a ZIP archive.
- Download Rollout Logs: Download rollout logs from Reinforcement Fine-Tuning jobs for offline analysis.
✨ New Models
- Gemma 3 12B Instruct is now available in the Model Library
- Gemma 3 4B Instruct is now available in the Model Library
- Qwen3 Omni 30B A3B Instruct is now available in the Model Library
📚 Documentation Updates
- Deployment Shapes API: Added List Deployment Shape Versions and Get Deployment Shape endpoints for querying available deployment shapes
- Evaluator APIs: Added Create Evaluator, Update Evaluator, and helper endpoints for evaluator source code, build logs, and upload validation
- Fine-Tuning APIs: Added Resume DPO Job, Resume Reinforcement Fine-Tuning Step, Execute Reinforcement Fine-Tuning Step, and Get Evaluation Job Log Endpoint
- SDK Examples: Added Python SDK example links for direct routing and supervised fine-tuning workflows
Bug Fixes & Minor Improvements
Original source Report a problem - Dec 15, 2025
- Date parsed from source:Dec 15, 2025
- First seen by Releasebot:Mar 20, 2026
2025-12-15
Fireworks AI releases a major update with a new Reasoning guide, expanded prompt caching guidance, and new model additions in the Model Library. It also refreshes compatibility, eval API, and firectl CLI docs, plus bug fixes and minor improvements.
Reasoning Guide, Prompt Caching Updates, New Models and CLI Updates
Reasoning Guide
A new Reasoning guide is now available in the documentation. This comprehensive guide covers:
- Accessing reasoning_content from thinking/reasoning models
- Controlling reasoning effort with the reasoning_effort parameter
- Streaming with reasoning content
- Interleaved thinking for multi-step tool-calling workflows
The guide provides code examples using the Fireworks Python SDK and explains how to work with models that support extended reasoning capabilities.
Prompt Caching Updates
Prompt caching documentation has been updated with expanded guidance:
- Cached prompt tokens on serverless now cost 50% less than uncached tokens
- Session affinity routing via the user field or x-session-affinity header for improved cache hit rates
- Prompt optimization techniques for maximizing cache efficiency
See the Prompt Caching guide for details.
✨ New Models
- Devstral Small 2 24B Instruct 2512 is now available in the Model Library
- NVIDIA Nemotron Nano 3 30B A3B is now available in the Model Library
📚 Documentation Updates
- Reasoning Guide: New documentation for working with reasoning models, including reasoning_content, reasoning_effort, streaming, and interleaved thinking (Reasoning)
- Recommended Models: Updated recommendations to include DeepSeek V3.2 for code generation and Kimi K2 Thinking as a GPT-5 alternative (Recommended Models)
- OpenAI Compatibility: Removed stop sequence documentation as Fireworks is now 1:1 compatible with OpenAI’s behavior (OpenAI Compatibility)
- Evaluator APIs: Added REST API documentation for Evaluator and Evaluation Job CRUD operations (Evals API Reference)
- firectl CLI Reference: Updated with new commands including cancel dpo-job, cancel supervised-fine-tuning-job, set-api-key, redeem-credit-code, and evaluator revision management
Bug Fixes & Minor Improvements
Original source Report a problem - Dec 8, 2025
- Date parsed from source:Dec 8, 2025
- First seen by Releasebot:Mar 20, 2026
2025-12-08
Fireworks AI adds DeepSeek V3.2 on serverless, cached token pricing in the Model Library, and new evaluations dashboard filtering and status tracking. It also expands the Model Library with several new models, updates reranking docs, and includes bug fixes and minor improvements.
DeepSeek V3.2 on Serverless, Cached Token Pricing, and New Models
☁️ Serverless
- DeepSeek V3.2 is now available on serverless
Cached Token Pricing Display
The Model Library and model detail pages now display cached and uncached input token pricing for serverless models that support prompt caching. This gives you better visibility into potential cost savings when using prompt caching with supported models.
Evaluations Dashboard Improvements
The Evaluations dashboard has been enhanced with new filtering and status tracking capabilities:
- Status column showing evaluator build state (Active, Building, Failed)
- Quick filters to filter evaluators and evaluation jobs by status
- Improved table layout with actions integrated into the status column
✨ New Models
- DeepSeek V3.2 is now available in the Model Library
- Ministral 3 14B Instruct 2512 is now available in the Model Library
- Ministral 3 8B Instruct 2512 is now available in the Model Library
- Ministral 3 3B Instruct 2512 is now available in the Model Library
- Mistral Large 3 675B Instruct is now available in the Model Library
- Qwen3-VL-32B-Instruct is now available in the Model Library
- Qwen3-VL-8B-Instruct is now available in the Model Library
📚 Documentation Updates
- Reranking Guide: Added documentation for using the /rerank endpoint and /embeddings endpoint with return_logits for reranking, including parallel batching examples (Querying Embeddings Models)
Bug Fixes & Minor Improvements
Original source Report a problem - Dec 1, 2025
- Date parsed from source:Dec 1, 2025
- First seen by Releasebot:Mar 20, 2026
2025-12-01
Fireworks AI adds audit logs and dataset downloads in the web app, expands Reinforcement Fine-Tuning with weighted training, and adds KAT Coder to the Model Library, alongside bug fixes and minor improvements.
Audit Logs, Dataset Download, Weighted Training for Reinforcement Fine-Tuning, and New Model
Audit Logs in Web App
You can now view and search audit logs directly from the Fireworks web app. The new Audit Logs page provides:
- Search and filter logs by status and timeframe
- Detailed view panel for individual log entries
- Easy navigation from the console sidebar under Account settings
See the Audit Logs documentation for more information.
Dataset Download
You can now download datasets directly from the Fireworks web app. The new download functionality allows you to:
- Download individual files from a dataset
- Download all files at once with “Download All”
- Access downloads from the Datasets table in the dashboard
Weighted Training for Reinforcement Fine-Tuning
Reinforcement Fine-Tuning now supports per-example weighting, giving you more control over which samples have greater influence during training. This feature mirrors the weighted training functionality already available in Supervised Fine-Tuning.
See the Weighted Training documentation for details on the weight field format.
✨ New Models
- KAT Coder is now available in the Model Library
Bug Fixes & Minor Improvements
Original source Report a problem - Nov 24, 2025
- Date parsed from source:Nov 24, 2025
- First seen by Releasebot:Mar 20, 2026
2025-11-24
Fireworks AI improves evaluator creation with GitHub template support and a new sortable table, adds MLOps and observability integrations for W&B and MLflow, expands the Model Library with Kimi K2 Thinking and KAT Dev models, and introduces new REST API endpoints for fine-tuning and deployment management.
Evaluator Improvements, Kimi K2 Thinking on Serverless, and New API Endpoints
Improved Evaluator Creation Experience
The evaluator creation workflow has been significantly enhanced with GitHub template integration. You can now:
- Fork evaluator templates directly from GitHub repositories
- Browse and preview templates before using them
- Create evaluators with a streamlined save dialog
- View evaluators in a new sortable and paginated table
MLOps & Observability Integrations
New documentation for integrating Fireworks with MLOps and observability tools:
- Weights & Biases (W&B) integration for experiment tracking during fine-tuning
- MLflow integration for model management and experiment logging
✨ New Models
- Kimi K2 Thinking is now available in the Model Library
- KAT Dev 32B is now available in the Model Library
- KAT Dev 72B Exp is now available in the Model Library
☁️ Serverless
- Kimi K2 Thinking is now available on serverless
📚 New REST API Endpoints
New REST API endpoints are now available for managing Reinforcement Fine-Tuning Steps and deployments:
- Create Reinforcement Fine-Tuning Step
- List Reinforcement Fine-Tuning Steps
- Get Reinforcement Fine-Tuning Step
- Delete Reinforcement Fine-Tuning Step
- Scale Deployment
- List Deployment Shape Versions
- Get Deployment Shape Version
- Get Dataset Download Endpoint
Bug Fixes & Minor Improvements
Original source Report a problem - Nov 12, 2025
- Date parsed from source:Nov 12, 2025
- First seen by Releasebot:Mar 20, 2026
2025-11-12
Fireworks AI introduces a new Python SDK for the Build experience, replacing the deprecated Build SDK, and improves RFT with better reliability, multi-turn training, stronger observability, and a more developer-friendly workflow.
☀️ Sunsetting Build SDK
The Build SDK is being deprecated in favor of a new Python SDK generated directly from our REST API. The new SDK is more up-to-date, flexible, and continuously synchronized with our REST API. Please note that the last version of the Build SDK will be 0.19.20, and the new SDK will start at 1.0.0. Python package managers will not automatically update to the new SDK, so you will need to manually update your dependencies and refactor your code.
Existing codebases using the Build SDK will continue to function as before and will not be affected unless you choose to upgrade to the new SDK version.
The new SDK replaces the Build SDK’s LLM and Dataset classes with REST API-aligned methods. If you upgrade to version 1.0.0 or later, you will need to migrate your code.
🚀 Improved RFT Experience
We’ve drastically improved the RFT experience with better reliability, developer-friendly SDK for hooking up your existing agents, support for multi-turn training, better observability in our Web App, and better overall developer experience.
See Reinforcement Fine-Tuning for more details.
Original source Report a problem
This is the end. You've seen all the release notes in this feed!