Replicate Release Notes
25 release notes curated from 1 source by the Releasebot Team. Last updated: Apr 22, 2026
- Apr 21, 2026
- Date parsed from source:Apr 21, 2026
- First seen by Releasebot:Apr 22, 2026
Agent skills for Replicate
Replicate now publishes agent skills for coding assistants, adding markdown guidance for model discovery, comparison, API execution, and better image and video prompting. The skills work with Claude Code, OpenCode, OpenAI Codex, and other compatible tools.
Replicate now publishes agent skills, a collection of markdown instruction files that give coding assistants expert knowledge about working with AI models on Replicate.
Skills cover model discovery, comparison, and execution via the API, along with detailed prompting techniques for image generation and video generation models. They follow the open Agent Skills spec and work with Claude Code, OpenCode, OpenAI Codex, and other compatible tools.
Install
npx skills add replicate/skillsThis installs all of Replicate’s skills into your project and configures them for your coding assistant automatically.
Skills and MCP
Skills are complementary to Replicate’s MCP server. MCP gives your coding assistant API tools. Skills give it knowledge about how to use those tools well: which models to choose, how to write prompts, and what tradeoffs to consider.
For more details, see the agent skills reference or the GitHub repository.
Original source - Mar 2, 2026
- Date parsed from source:Mar 2, 2026
- First seen by Releasebot:Mar 2, 2026
Fallback model for Nano Banana Pro
Nano Banana Pro now falls back to Seedream 5.0 lite when Google's API is rate limited, instead of failing. Enable with allow_fallback_model; on rate limits it uses the fallback and marks the output as fallback. Note limits: no 1K or 4K, and no 4:5 or 5:4 aspect ratios; cost applies.
How it works
Set allow_fallback_model to true when calling the API. If Nano Banana Pro hits a rate limit, it tries to generate the image with Seedream 5.0 lite instead. For certain inputs, for example if the aspect ratio isn’t supported, the original rate limit error is returned.
The fallback is off by default. If you don’t set allow_fallback_model, nothing changes — you’ll get a rate limit error when Google’s API is at capacity.
When the fallback is triggered, your logs still show a prediction to Nano Banana Pro. You can tell the fallback was used by checking the resolution field in your output — it says "fallback" instead of the actual resolution. You’re charged the cost of the fallback model, not Nano Banana Pro.
Limitations
Our current fallback model, Seedream 5.0 lite, doesn’t support all the same options as Nano Banana Pro:
- Seedream 5.0 lite doesn’t support 1K resolution. If you request 1K, the fallback generates at 2K and downscales the result.
- Seedream 5.0 lite doesn’t support 4K resolution. If you request 4K, the fallback won’t be used and the original rate limit error is returned.
- Seedream 5.0 lite doesn’t support the 4:5 and 5:4 aspect ratios. Requests with these ratios won’t fall back and will return the original rate limit error.
All of your release notes in one feed
Join Releasebot and get updates from Replicate and hundreds of other software products.
- Feb 10, 2026
- Date parsed from source:Feb 10, 2026
- First seen by Releasebot:Feb 11, 2026
MCP server auto-discovery
Replicate’s MCP server now supports automatic discovery via the official MCP Registry with a new /.well-known/mcp/server.json endpoint. The Registry holds metadata to guide MCP clients to install servers, enabling built‑in discovery in select clients like VS Code, plus a --tools flag for standard or code mode.
MCP server discovery
Replicate’s MCP server can now be discovered automatically through the official MCP Registry.
We added a /.well-known/mcp/server.json endpoint that publishes metadata about the MCP server. This follows the server.json specification from the Model Context Protocol.How discovery works
The MCP Registry is the official metadata repository for MCP servers, backed by Anthropic, GitHub, and Microsoft. It doesn’t host code—just metadata that describes where to find servers and how to install them.
When you publish a server.json file at /.well-known/mcp/server.json, the Registry can discover your server automatically. MCP clients then use the Registry to find and install servers.Clients with built-in discovery
A few MCP clients have built-in marketplaces or directories:
- VS Code has the best Registry integration. Enable chat.mcp.gallery.enabled in your settings, then search @mcp in the Extensions view to browse and install MCP servers.
- Claude Desktop has a curated extensions directory at Settings > Extensions > Browse extensions.
Other clients like ChatGPT, Cursor, and LM Studio require manual configuration—you add the server URL or edit a config file yourself.
Code mode option
The metadata also exposes the --tools flag, which lets you choose between standard tools (all) or code mode (code) when installing.
Original source - Jan 14, 2026
- Date parsed from source:Jan 14, 2026
- First seen by Releasebot:Jan 14, 2026
Filter predictions by source
Filter list predictions API by source
You can now filter the list predictions API endpoint to show only predictions created through the web interface.
Use the source query parameter with a value of web :
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ "https://api.replicate.com/v1/predictions?source=web"This is useful if you want to see predictions you created using the playground or other parts of the Replicate website, separate from predictions created programmatically via the API.
Original source
Note: When filtering by source=web , results are limited to predictions from the last 14 days. - Sep 26, 2025
- Date parsed from source:Sep 26, 2025
- First seen by Releasebot:Jan 14, 2026
The little things, week ending September 26, 2025
Studio updates boost usability and speed with a new beta search API, easier image inputs, and faster homepage rendering. Docs add model comparisons and new optimization guides while Cog gains PyTorch 2.8.0 compatibility and Python 3.13 support.
Web
- Updated playground to make it easier to use output images as inputs
- Improved load time and rendering of the homepage
- Added a link to the model detail page to view all predictions you’ve made with that model
API
- Launched a new search API (in beta) that makes it easier to find models, collections, and docs in a single call
Docs
- Published a comprehensive comparison of image editing models to help you choose the right tool for your project
- Moved “Deploy a custom model” to the get-started section for better discoverability
- Added new guide for optimizing models with Pruna to help you make models faster and cheaper
- Added documentation for throttling when you have low credit balance
- Updated rate limits error message format and clarified burst behavior in the API reference
- Enhanced the 404 page
- Fixed some visual inconsistencies, especially when using dark mode
Cog
- Updated Cog to support PyTorch 2.8.0 compatibility in v0.16.7
- Improved cog init to download the latest agent instructions from docs
- Added better support for Python 3.13 base images
- Sep 16, 2025
- Date parsed from source:Sep 16, 2025
- First seen by Releasebot:Jan 14, 2026
New search API, now in beta
Replicate rolls out a new search API to find models, collections, and docs faster. SDKs now support search in TypeScript, Python, and MCP with quick install and examples, while the old models endpoint remains usable but migration is advised for better results.
We’ve added a new search API that makes it easier to find models, collections, and docs.
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ "https://api.replicate.com/v1/search?query=lip+sync"SDK support
The search API is available in our SDKs:
- TypeScript: npm install replicate@alpha and use replicate.search()
- Python: pip install --pre replicate and use replicate.search()
- MCP: Available in both our remote and local MCP servers
Backwards compatibility
The existing QUERY /v1/models endpoint still works, but we recommend migrating to the new search endpoint for improved results.
Original source
Read our announcement blog post for more details and example code. - Sep 12, 2025
- Date parsed from source:Sep 12, 2025
- First seen by Releasebot:Jan 14, 2026
The little things, week ending September 12, 2025
Platform and Web updates boost speed and accessibility with torch.compile caching delivering 2-3x faster builds, and added web URLs in predictions plus related models on non official pages. Docs expand security, torch.compile guidance, Pruna Cog optimization, and Torch 2.8.0 support, with fixes for credits display.
Platform
- Added invoices for purchases of prepaid credit
- Launched torch compile caching with models using torch.compile starting 2-3x faster thanks to cached compilation artifacts
- Added web URLs to prediction objects, so you can view predictions in your browser directly from API responses
Web
- Added related models to non-official model pages, to help you find similar models
- Fixed rendering issues with the display of remaining credits
- Added better support for models with video cover images
Docs
- Added a comprehensive Security topic with documentation on API token management, including automated token scanning and compromise detection
- Added a torch.compile guide with practical examples for improving model performance
- Added a new guide for optimizing models with Pruna
Cog
- Added torch 2.8.0 compatibility
- Sep 8, 2025
- Date parsed from source:Sep 8, 2025
- First seen by Releasebot:Jan 14, 2026
Torch compile caching
Torch.compile now speeds up inference by 2–3x for several models thanks to cached artifacts, with notably faster boot times. Benchmarks show over 30% faster in some cases, highlighting torch.compile as a turnkey performance upgrade.
torch.compile can speed up your inference time significantly, but at the cost of slower startup times. We’ve implemented caching of torch.compile artifacts across model instances to help your models boot faster.
Models using torch.compile like black-forest-labs/flux-kontext-dev, prunaai/flux-schnell, and prunaai/flux.1-dev-lora now start 2-3x faster.
In our tests of inference speed with black-forest-labs/flux-kontext-dev, the compiled version runs over 30% faster than the uncompiled one, making torch.compile an important feature to explore.
For more details, check out the blog post. If you’re building your own custom models, check out our guide to improving model performance with torch.compile.
To learn more about how to use torch.compile, check out the official PyTorch torch.compile tutorial.
Original source - Aug 29, 2025
- Date parsed from source:Aug 29, 2025
- First seen by Releasebot:Jan 14, 2026
The little things, week ending August 29, 2025
New AI analysis features surface image and video arena rankings in search results. Organization signups require email verification, and the dashboard, navigation, filtering, and accessibility get broad UI improvements. Bug fixes cover edge dropdowns, readme fetch, and avatar visibility.
Release notes
- Added Artificial Analysis image and video arena rankings to search results
- Added email verification when signing up for an organization
- Improved rendering of the billing summary on the dashboard
- Continued improving the site navigation across Replicate
- Cleaned up filtering options on the prediction list to make it easier to navigate
- Fixed a bug that may have caused filenames to overflow on the playground
- Fixed a bug when fetching a model’s readme while using an Accept header
- Fixed a bug that may have caused dropdowns to appear incorrectly on Microsoft Edge when using dark mode
- Enhanced radio button visibility on model create with better contrast
- Standardized number formatting across the platform to use consistent en-US locale
- Fixed avatar menu username visibility across different screen sizes
- Improved link underlines in blog posts for better readability and visibility
- Aug 14, 2025
- Date parsed from source:Aug 14, 2025
- First seen by Releasebot:Jan 14, 2026
The little things, week ending August 14, 2025
Replicate rolls out a broad UI refresh and docs overhaul with a sleeker model page header, refreshed homepage data, clearer predictions filtering, improved search and 404 fixes. Cog adds new guides and organized categories, plus secret inputs and community model docs. API lands deployment fix and a remote MCP server for HTTP API.
Web
- Overhauled the model page header to make it easier to find what you’re looking for
- Updated the Replicate homepage with the freshest model data
- Tweaked the predictions interface to make filtering clearer
- Continued improving search results, including a bug that led to 404’s on collections, and where video models were not displaying correctly
- Improved docs search results interface
Cog
- Updated Node.js starter guide to user newer models.
- Added docs about secret inputs for model authors and model users .
- Added docs about community models .
- Added docs for using Replicate MCP in Google Gemini CLI
- Added docs for using Replicate MCP in OpenAI Codex CLI
- Guides are now organized into categories: Run models, Build models, Go deeper.
API
- Fixed the deployments.update API to return updated deployment config.
- Released mcp.replicate.com , a remote MCP server for Replicate’s HTTP API
- Aug 5, 2025
- Date parsed from source:Aug 5, 2025
- First seen by Releasebot:Jan 14, 2026
Run all models with the same API endpoint
Replicate unifies model execution by enabling POST /v1/predictions to run official and community models with unified owner/name formats. Backward compatible upgrades keep existing endpoints working while simplifying how you call any model.
What changed?
You can now use the POST /v1/predictions HTTP API endpoint to run any model on Replicate, whether it’s an official model or a community model. This removes the confusion about which endpoint to use for different types of models.
What changed?
The POST /v1/predictions endpoint now accepts official model identifiers in the owner/name format, in addition to the existing {owner}/{name}:{version_id} and {version_id} formats.
The existing POST /v1/models/{model_owner}/{model_name}/predictions endpoint will still be supported for running official models. If you’re already using that endpoint, you don’t need to change anything.
This change is backward compatible. Existing code will continue to work without any modifications.
Supported version identifiers
When using the POST /v1/predictions endpoint, you can specify models in these formats:
- {owner}/{name} - For official models (e.g., black-forest-labs/flux-schnell)
- {owner}/{name}:{version_id} - For community models with full version ID (e.g., replicate/hello-world:9dcd6d78e7c6560c340d916fe32e9f24aabfa331e5cce95fe31f77fb03121426)
- {version_id} - Just the 64-character version ID (e.g., 9dcd6d78e7c6560c340d916fe32e9f24aabfa331e5cce95fe31f77fb03121426)
Example
Here’s an example of how to run an official model (in this case, black-forest-labs/flux-schnell) using the POST /v1/predictions operation:
Original sourcecurl -X POST https://api.replicate.com/v1/predictions \ -H "Authorization: Token $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "version": "black-forest-labs/flux-schnell", "input": { "prompt": "A photo of a cat" } }' - Aug 1, 2025
- Date parsed from source:Aug 1, 2025
- First seen by Releasebot:Jan 14, 2026
The little things, week ending August 1, 2025
Web
- Rolled out a new search experience across Replicate
- Added a new enterprise page
- Added support for filtering models in the playground
- Fixed a bug where dark mode may not have been sticky in certain circumstances
- Fixed a bug where creating a model might silently fail
- Fixed a bug with using keyboard shortcuts in playground causing unexpected results
- Improved 404 pages
- Improved pricing display for models
Cog
- Added support for python 3.13 base images
Models
- Veo-3 now supports 1080p
- Kontext LoRA trainer now supports up to 20k steps
- Jul 29, 2025
- Date parsed from source:Jul 29, 2025
- First seen by Releasebot:Jan 14, 2026
Purchase prepaid credit
Replicate adds prepaid credit billing to help you prepay and manage spend. New accounts since July 16, 2025 are billed via prepaid credit; existing users can stay on monthly or migrate later with guidance.
Prepaid credit
You can now purchase prepaid credit for your Replicate account. This is a helpful option if you want to manage your spending more proactively. It can also make paying for Replicate easier if your bank requires additional authentication for recurring charges.
Since July 16, 2025, all new accounts are being billed through prepaid credit instead of being billed monthly. For existing users who signed up before July 16, nothing changes if you don’t want it to. You can continue to get billed monthly - no action required.
At some point in the future, we will be migrating most accounts from monthly billing to prepaid credit. We’ll work with you to make the transition as smooth as possible and will share more details as our plans develop. If you want to move from monthly billing to prepaid credit sooner, email [email protected] .
To purchase credit:
- Visit replicate.com/account/billing (or click your avatar → Account settings → Billing ).
- Choose Add credit and follow the prompts.
- Optionally, set up auto reload to add to your credit balance when it dips below a preset threshold.
Once you purchase credit, any usage will be deducted from that credit balance. If you run out of credit, we’ll charge you for any overages at the beginning of the following month.
For more details, see our prepaid credit docs .
Original source - Jul 21, 2025
- Date parsed from source:Jul 21, 2025
- First seen by Releasebot:Jan 14, 2026
Introducing a new Cog runtime
Cog unveils a new production runtime implemented in Go with a Python runner, promising cleaner dependency control and better performance. Update to Cog >= 0.16.0 and enable cog_runtime to try the new path, with API tweaks and deprecated File usage phased out. The old runtime will be deprecated in a future release.
Introduction
We are introducing a new implementation of Cog’s production runtime component. This is the part of Cog responsible for predictor schema validation, prediction execution and HTTP serving.
tl;dr
If you’re a model author and want to try out the new runtime, make sure you’re on Cog >= 0.16.0 and add build.cog_runtime: true to cog.yaml :
build: # Enable new Cog runtime implementation cog_runtime: trueMost existing models should work as is, apart from a few exceptions. If you hit one of the exceptions, please follow the messages printed by cog to update your code. Read below for why these are necessary.
Note that:
- The experimental training interface is not supported yet.
- This new runtime will become the default in a future Cog release, after which the existing one will be deprecated.
Why build this?
The existing Cog runtime was written in Python and relies heavily on Pydantic and several other libraries when performing predictions. This leads to several problems:
- Dependency issues: many Python libraries pull in conflicting versions of common dependencies, e.g. Pydantic. This causes runtime errors, sometimes even by just rebuilding the image which pulls a newer version of the dependency. By removing all Python dependencies from Cog runtime, you have total control of your model’s dependency graph.
- Ambiguous predictor interface: we relied on Pydantic for checking predictor input and output types, which can be ambiguous and error prone, e.g. allowing types that may be handled incorrectly by other parts of our ecosystem or user code. It’s also hard to support custom data types due to potentially incompatible Pydantic versions, i.e. v1 vs v2 .
- Error handling: since Cog HTTP server and predictor are both Python code running via multiprocessing , it’s hard to differentiate platform errors, i.e. Cog, vs application errors, i.e. predictor. A model crash may cause the server to end up in a bad state with no useful logging.
- Performance: certain things are hard to implement correctly and efficiently in Python, i.e. async HTTP handling, file upload & download, concurrency, serialization.
To tackle these problems, we re-implemented the runtime part of Cog with the following components:
- Schema validation in pure vanilla Python via inspect and no Pydantic or any other dependency
- Decoupled HTTP server rewritten in Go
- Custom, pluggable data serialization
This allows us to minimize the runtime logic in Python and reduce the risk of it interfering with application code. The Go server is now responsible for most of the heavy lifting:
- HTTP server and webhooks
- Input file download and output file upload
- Logging
The Go server communicates with the bare minimum Python runner via JSON files for input/output and HTTP/signals for IPC. The Python runner is solely responsible for invoking the predictor’s setup() and predict() methods.
What do I need to change?
Most of the Cog API, Predictor , Input , BaseModel , etc. are source compatible. There are 3 changes that might require updating the model.
- Improved semantics of optional inputs
- Cleaner dependencies
- Removal of deprecated File API.
First, ambiguous optional inputs are no longer allowed. For example, in existing Cog, declaring prompt: str suggests that it cannot be None , while it still allows default=None , which can confuse type checkers and lead to buggy code, e.g. if it doesn’t check for none-ness. For example, instead of:
def predict(prompt: str=Input(description="prompt", default=None))We should use:
def predict(prompt: Optional[str]=Input(description="prompt")Note that default=None is now redundant and removed, as Optional[str] implies that the input may be None , and type checker can warn us about checking it.
Second notable change is that the new Cog runtime no longer depends on any of the Python dependencies of the existing runtime. You’ll have to add them to requirements.txt if the model relies on them and they’re not pulled in via any other third party libraries.
- attrs
- fastapi
- pydantic
- PyYAML
- requests
- structlog
- typing_extensions
- uvicorn
Third change is the removal of deprecated cog.File API. Use cog.Path instead.
Original source - Dec 19, 2025
- Date parsed from source:Dec 19, 2025
- First seen by Releasebot:Dec 20, 2025
The little things, week ending December 19, 2025
Web
Improved the reliability of google/nano-banana and google/nano-banana-pro
Improved accessibility when using the search bar across Replicate
Docs
Added automatic llms.txt generation for documentation, making it easier for language models to discover and understand Replicate’s docs
Published blog post on how to run Retro Diffusion’s pixel art models on Replicate, including rd-fast, rd-plus, rd-tile, and rd-animation for generating game assets and sprites
Curated by the Releasebot team
Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.
Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.
Similar to Replicate with recent updates:
- xAI release notes81 release notes · Latest Jun 3, 2026
- OpenAI release notes733 release notes · Latest Jun 5, 2026
- n8n release notes54 release notes · Latest Jun 2, 2026
- Cursor release notes94 release notes · Latest Jun 5, 2026
- Eleven Labs release notes64 release notes · Latest Jun 1, 2026
- Perplexity release notes25 release notes · Latest May 29, 2026