Replicate Release Notes

Follow

25 release notes curated from 1 source by the Releasebot Team. Last updated: Apr 22, 2026

Get this feed:
  • Apr 21, 2026
    • Date parsed from source:
      Apr 21, 2026
    • First seen by Releasebot:
      Apr 22, 2026
    Replicate logo

    Replicate

    Agent skills for Replicate

    Replicate now publishes agent skills for coding assistants, adding markdown guidance for model discovery, comparison, API execution, and better image and video prompting. The skills work with Claude Code, OpenCode, OpenAI Codex, and other compatible tools.

    Replicate now publishes agent skills, a collection of markdown instruction files that give coding assistants expert knowledge about working with AI models on Replicate.

    Skills cover model discovery, comparison, and execution via the API, along with detailed prompting techniques for image generation and video generation models. They follow the open Agent Skills spec and work with Claude Code, OpenCode, OpenAI Codex, and other compatible tools.

    Install

    npx skills add replicate/skills
    

    This installs all of Replicate’s skills into your project and configures them for your coding assistant automatically.

    Skills and MCP

    Skills are complementary to Replicate’s MCP server. MCP gives your coding assistant API tools. Skills give it knowledge about how to use those tools well: which models to choose, how to write prompts, and what tradeoffs to consider.

    For more details, see the agent skills reference or the GitHub repository.

    Original source
  • Mar 2, 2026
    • Date parsed from source:
      Mar 2, 2026
    • First seen by Releasebot:
      Mar 2, 2026
    Replicate logo

    Replicate

    Fallback model for Nano Banana Pro

    Nano Banana Pro now falls back to Seedream 5.0 lite when Google's API is rate limited, instead of failing. Enable with allow_fallback_model; on rate limits it uses the fallback and marks the output as fallback. Note limits: no 1K or 4K, and no 4:5 or 5:4 aspect ratios; cost applies.

    How it works

    Set allow_fallback_model to true when calling the API. If Nano Banana Pro hits a rate limit, it tries to generate the image with Seedream 5.0 lite instead. For certain inputs, for example if the aspect ratio isn’t supported, the original rate limit error is returned.

    The fallback is off by default. If you don’t set allow_fallback_model, nothing changes — you’ll get a rate limit error when Google’s API is at capacity.

    When the fallback is triggered, your logs still show a prediction to Nano Banana Pro. You can tell the fallback was used by checking the resolution field in your output — it says "fallback" instead of the actual resolution. You’re charged the cost of the fallback model, not Nano Banana Pro.

    Limitations

    Our current fallback model, Seedream 5.0 lite, doesn’t support all the same options as Nano Banana Pro:

    • Seedream 5.0 lite doesn’t support 1K resolution. If you request 1K, the fallback generates at 2K and downscales the result.
    • Seedream 5.0 lite doesn’t support 4K resolution. If you request 4K, the fallback won’t be used and the original rate limit error is returned.
    • Seedream 5.0 lite doesn’t support the 4:5 and 5:4 aspect ratios. Requests with these ratios won’t fall back and will return the original rate limit error.
    Original source
  • All of your release notes in one feed

    Join Releasebot and get updates from Replicate and hundreds of other software products.

    Create account
  • Feb 10, 2026
    • Date parsed from source:
      Feb 10, 2026
    • First seen by Releasebot:
      Feb 11, 2026
    Replicate logo

    Replicate

    MCP server auto-discovery

    Replicate’s MCP server now supports automatic discovery via the official MCP Registry with a new /.well-known/mcp/server.json endpoint. The Registry holds metadata to guide MCP clients to install servers, enabling built‑in discovery in select clients like VS Code, plus a --tools flag for standard or code mode.

    MCP server discovery

    Replicate’s MCP server can now be discovered automatically through the official MCP Registry.
    We added a /.well-known/mcp/server.json endpoint that publishes metadata about the MCP server. This follows the server.json specification from the Model Context Protocol.

    How discovery works

    The MCP Registry is the official metadata repository for MCP servers, backed by Anthropic, GitHub, and Microsoft. It doesn’t host code—just metadata that describes where to find servers and how to install them.
    When you publish a server.json file at /.well-known/mcp/server.json, the Registry can discover your server automatically. MCP clients then use the Registry to find and install servers.

    Clients with built-in discovery

    A few MCP clients have built-in marketplaces or directories:

    • VS Code has the best Registry integration. Enable chat.mcp.gallery.enabled in your settings, then search @mcp in the Extensions view to browse and install MCP servers.
    • Claude Desktop has a curated extensions directory at Settings > Extensions > Browse extensions.
      Other clients like ChatGPT, Cursor, and LM Studio require manual configuration—you add the server URL or edit a config file yourself.

    Code mode option

    The metadata also exposes the --tools flag, which lets you choose between standard tools (all) or code mode (code) when installing.

    Original source
  • Jan 14, 2026
    • Date parsed from source:
      Jan 14, 2026
    • First seen by Releasebot:
      Jan 14, 2026
    Replicate logo

    Replicate

    Filter predictions by source

    Filter list predictions API by source

    You can now filter the list predictions API endpoint to show only predictions created through the web interface.

    Use the source query parameter with a value of web :

    curl -s \
    -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
    "https://api.replicate.com/v1/predictions?source=web"
    

    This is useful if you want to see predictions you created using the playground or other parts of the Replicate website, separate from predictions created programmatically via the API.
    Note: When filtering by source=web , results are limited to predictions from the last 14 days.

    Original source
  • Sep 26, 2025
    • Date parsed from source:
      Sep 26, 2025
    • First seen by Releasebot:
      Jan 14, 2026
    Replicate logo

    Replicate

    The little things, week ending September 26, 2025

    Studio updates boost usability and speed with a new beta search API, easier image inputs, and faster homepage rendering. Docs add model comparisons and new optimization guides while Cog gains PyTorch 2.8.0 compatibility and Python 3.13 support.

    Web

    • Updated playground to make it easier to use output images as inputs
    • Improved load time and rendering of the homepage
    • Added a link to the model detail page to view all predictions you’ve made with that model

    API

    • Launched a new search API (in beta) that makes it easier to find models, collections, and docs in a single call

    Docs

    • Published a comprehensive comparison of image editing models to help you choose the right tool for your project
    • Moved “Deploy a custom model” to the get-started section for better discoverability
    • Added new guide for optimizing models with Pruna to help you make models faster and cheaper
    • Added documentation for throttling when you have low credit balance
    • Updated rate limits error message format and clarified burst behavior in the API reference
    • Enhanced the 404 page
    • Fixed some visual inconsistencies, especially when using dark mode

    Cog

    • Updated Cog to support PyTorch 2.8.0 compatibility in v0.16.7
    • Improved cog init to download the latest agent instructions from docs
    • Added better support for Python 3.13 base images
    Original source
  • Sep 16, 2025
    • Date parsed from source:
      Sep 16, 2025
    • First seen by Releasebot:
      Jan 14, 2026
    Replicate logo

    Replicate

    New search API, now in beta

    Replicate rolls out a new search API to find models, collections, and docs faster. SDKs now support search in TypeScript, Python, and MCP with quick install and examples, while the old models endpoint remains usable but migration is advised for better results.

    We’ve added a new search API that makes it easier to find models, collections, and docs.

    curl -s \
    -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
    "https://api.replicate.com/v1/search?query=lip+sync"
    

    SDK support

    The search API is available in our SDKs:

    • TypeScript: npm install replicate@alpha and use replicate.search()
    • Python: pip install --pre replicate and use replicate.search()
    • MCP: Available in both our remote and local MCP servers

    Backwards compatibility

    The existing QUERY /v1/models endpoint still works, but we recommend migrating to the new search endpoint for improved results.
    Read our announcement blog post for more details and example code.

    Original source
  • Sep 12, 2025
    • Date parsed from source:
      Sep 12, 2025
    • First seen by Releasebot:
      Jan 14, 2026
    Replicate logo

    Replicate

    The little things, week ending September 12, 2025

    Platform and Web updates boost speed and accessibility with torch.compile caching delivering 2-3x faster builds, and added web URLs in predictions plus related models on non official pages. Docs expand security, torch.compile guidance, Pruna Cog optimization, and Torch 2.8.0 support, with fixes for credits display.

    Platform

    • Added invoices for purchases of prepaid credit
    • Launched torch compile caching with models using torch.compile starting 2-3x faster thanks to cached compilation artifacts
    • Added web URLs to prediction objects, so you can view predictions in your browser directly from API responses

    Web

    • Added related models to non-official model pages, to help you find similar models
    • Fixed rendering issues with the display of remaining credits
    • Added better support for models with video cover images

    Docs

    • Added a comprehensive Security topic with documentation on API token management, including automated token scanning and compromise detection
    • Added a torch.compile guide with practical examples for improving model performance
    • Added a new guide for optimizing models with Pruna

    Cog

    • Added torch 2.8.0 compatibility
    Original source
  • Sep 8, 2025
    • Date parsed from source:
      Sep 8, 2025
    • First seen by Releasebot:
      Jan 14, 2026
    Replicate logo

    Replicate

    Torch compile caching

    Torch.compile now speeds up inference by 2–3x for several models thanks to cached artifacts, with notably faster boot times. Benchmarks show over 30% faster in some cases, highlighting torch.compile as a turnkey performance upgrade.

    torch.compile can speed up your inference time significantly, but at the cost of slower startup times. We’ve implemented caching of torch.compile artifacts across model instances to help your models boot faster.

    Models using torch.compile like black-forest-labs/flux-kontext-dev, prunaai/flux-schnell, and prunaai/flux.1-dev-lora now start 2-3x faster.

    In our tests of inference speed with black-forest-labs/flux-kontext-dev, the compiled version runs over 30% faster than the uncompiled one, making torch.compile an important feature to explore.

    For more details, check out the blog post. If you’re building your own custom models, check out our guide to improving model performance with torch.compile.

    To learn more about how to use torch.compile, check out the official PyTorch torch.compile tutorial.

    Original source
  • Aug 29, 2025
    • Date parsed from source:
      Aug 29, 2025
    • First seen by Releasebot:
      Jan 14, 2026
    Replicate logo

    Replicate

    The little things, week ending August 29, 2025

    New AI analysis features surface image and video arena rankings in search results. Organization signups require email verification, and the dashboard, navigation, filtering, and accessibility get broad UI improvements. Bug fixes cover edge dropdowns, readme fetch, and avatar visibility.

    Release notes

    • Added Artificial Analysis image and video arena rankings to search results
    • Added email verification when signing up for an organization
    • Improved rendering of the billing summary on the dashboard
    • Continued improving the site navigation across Replicate
    • Cleaned up filtering options on the prediction list to make it easier to navigate
    • Fixed a bug that may have caused filenames to overflow on the playground
    • Fixed a bug when fetching a model’s readme while using an Accept header
    • Fixed a bug that may have caused dropdowns to appear incorrectly on Microsoft Edge when using dark mode
    • Enhanced radio button visibility on model create with better contrast
    • Standardized number formatting across the platform to use consistent en-US locale
    • Fixed avatar menu username visibility across different screen sizes
    • Improved link underlines in blog posts for better readability and visibility
    Original source
  • Aug 14, 2025
    • Date parsed from source:
      Aug 14, 2025
    • First seen by Releasebot:
      Jan 14, 2026
    Replicate logo

    Replicate

    The little things, week ending August 14, 2025

    Replicate rolls out a broad UI refresh and docs overhaul with a sleeker model page header, refreshed homepage data, clearer predictions filtering, improved search and 404 fixes. Cog adds new guides and organized categories, plus secret inputs and community model docs. API lands deployment fix and a remote MCP server for HTTP API.

    Web

    • Overhauled the model page header to make it easier to find what you’re looking for
    • Updated the Replicate homepage with the freshest model data
    • Tweaked the predictions interface to make filtering clearer
    • Continued improving search results, including a bug that led to 404’s on collections, and where video models were not displaying correctly
    • Improved docs search results interface

    Cog

    • Updated Node.js starter guide to user newer models.
    • Added docs about secret inputs for model authors and model users .
    • Added docs about community models .
    • Added docs for using Replicate MCP in Google Gemini CLI
    • Added docs for using Replicate MCP in OpenAI Codex CLI
    • Guides are now organized into categories: Run models, Build models, Go deeper.

    API

    • Fixed the deployments.update API to return updated deployment config.
    • Released mcp.replicate.com , a remote MCP server for Replicate’s HTTP API
    Original source
  • Aug 5, 2025
    • Date parsed from source:
      Aug 5, 2025
    • First seen by Releasebot:
      Jan 14, 2026
    Replicate logo

    Replicate

    Run all models with the same API endpoint

    Replicate unifies model execution by enabling POST /v1/predictions to run official and community models with unified owner/name formats. Backward compatible upgrades keep existing endpoints working while simplifying how you call any model.

    What changed?

    You can now use the POST /v1/predictions HTTP API endpoint to run any model on Replicate, whether it’s an official model or a community model. This removes the confusion about which endpoint to use for different types of models.

    What changed?

    The POST /v1/predictions endpoint now accepts official model identifiers in the owner/name format, in addition to the existing {owner}/{name}:{version_id} and {version_id} formats.

    The existing POST /v1/models/{model_owner}/{model_name}/predictions endpoint will still be supported for running official models. If you’re already using that endpoint, you don’t need to change anything.

    This change is backward compatible. Existing code will continue to work without any modifications.

    Supported version identifiers

    When using the POST /v1/predictions endpoint, you can specify models in these formats:

    • {owner}/{name} - For official models (e.g., black-forest-labs/flux-schnell)
    • {owner}/{name}:{version_id} - For community models with full version ID (e.g., replicate/hello-world:9dcd6d78e7c6560c340d916fe32e9f24aabfa331e5cce95fe31f77fb03121426)
    • {version_id} - Just the 64-character version ID (e.g., 9dcd6d78e7c6560c340d916fe32e9f24aabfa331e5cce95fe31f77fb03121426)

    Example

    Here’s an example of how to run an official model (in this case, black-forest-labs/flux-schnell) using the POST /v1/predictions operation:

    curl -X POST https://api.replicate.com/v1/predictions \
    -H "Authorization: Token $REPLICATE_API_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{
      "version": "black-forest-labs/flux-schnell",
      "input": {
        "prompt": "A photo of a cat"
      }
    }'
    
    Original source
  • Aug 1, 2025
    • Date parsed from source:
      Aug 1, 2025
    • First seen by Releasebot:
      Jan 14, 2026
    Replicate logo

    Replicate

    The little things, week ending August 1, 2025

    Web

    • Rolled out a new search experience across Replicate
    • Added a new enterprise page
    • Added support for filtering models in the playground
    • Fixed a bug where dark mode may not have been sticky in certain circumstances
    • Fixed a bug where creating a model might silently fail
    • Fixed a bug with using keyboard shortcuts in playground causing unexpected results
    • Improved 404 pages
    • Improved pricing display for models

    Cog

    • Added support for python 3.13 base images

    Models

    • Veo-3 now supports 1080p
    • Kontext LoRA trainer now supports up to 20k steps
    Original source
  • Jul 29, 2025
    • Date parsed from source:
      Jul 29, 2025
    • First seen by Releasebot:
      Jan 14, 2026
    Replicate logo

    Replicate

    Purchase prepaid credit

    Replicate adds prepaid credit billing to help you prepay and manage spend. New accounts since July 16, 2025 are billed via prepaid credit; existing users can stay on monthly or migrate later with guidance.

    Prepaid credit

    You can now purchase prepaid credit for your Replicate account. This is a helpful option if you want to manage your spending more proactively. It can also make paying for Replicate easier if your bank requires additional authentication for recurring charges.

    Since July 16, 2025, all new accounts are being billed through prepaid credit instead of being billed monthly. For existing users who signed up before July 16, nothing changes if you don’t want it to. You can continue to get billed monthly - no action required.

    At some point in the future, we will be migrating most accounts from monthly billing to prepaid credit. We’ll work with you to make the transition as smooth as possible and will share more details as our plans develop. If you want to move from monthly billing to prepaid credit sooner, email [email protected] .

    To purchase credit:

    • Visit replicate.com/account/billing (or click your avatar → Account settings → Billing ).
    • Choose Add credit and follow the prompts.
    • Optionally, set up auto reload to add to your credit balance when it dips below a preset threshold.

    Once you purchase credit, any usage will be deducted from that credit balance. If you run out of credit, we’ll charge you for any overages at the beginning of the following month.

    For more details, see our prepaid credit docs .

    Original source
  • Jul 21, 2025
    • Date parsed from source:
      Jul 21, 2025
    • First seen by Releasebot:
      Jan 14, 2026
    Replicate logo

    Replicate

    Introducing a new Cog runtime

    Cog unveils a new production runtime implemented in Go with a Python runner, promising cleaner dependency control and better performance. Update to Cog >= 0.16.0 and enable cog_runtime to try the new path, with API tweaks and deprecated File usage phased out. The old runtime will be deprecated in a future release.

    Introduction

    We are introducing a new implementation of Cog’s production runtime component. This is the part of Cog responsible for predictor schema validation, prediction execution and HTTP serving.

    tl;dr

    If you’re a model author and want to try out the new runtime, make sure you’re on Cog >= 0.16.0 and add build.cog_runtime: true to cog.yaml :

    build:
    # Enable new Cog runtime implementation
    cog_runtime: true
    

    Most existing models should work as is, apart from a few exceptions. If you hit one of the exceptions, please follow the messages printed by cog to update your code. Read below for why these are necessary.

    Note that:

    • The experimental training interface is not supported yet.
    • This new runtime will become the default in a future Cog release, after which the existing one will be deprecated.

    Why build this?

    The existing Cog runtime was written in Python and relies heavily on Pydantic and several other libraries when performing predictions. This leads to several problems:

    • Dependency issues: many Python libraries pull in conflicting versions of common dependencies, e.g. Pydantic. This causes runtime errors, sometimes even by just rebuilding the image which pulls a newer version of the dependency. By removing all Python dependencies from Cog runtime, you have total control of your model’s dependency graph.
    • Ambiguous predictor interface: we relied on Pydantic for checking predictor input and output types, which can be ambiguous and error prone, e.g. allowing types that may be handled incorrectly by other parts of our ecosystem or user code. It’s also hard to support custom data types due to potentially incompatible Pydantic versions, i.e. v1 vs v2 .
    • Error handling: since Cog HTTP server and predictor are both Python code running via multiprocessing , it’s hard to differentiate platform errors, i.e. Cog, vs application errors, i.e. predictor. A model crash may cause the server to end up in a bad state with no useful logging.
    • Performance: certain things are hard to implement correctly and efficiently in Python, i.e. async HTTP handling, file upload & download, concurrency, serialization.

    To tackle these problems, we re-implemented the runtime part of Cog with the following components:

    • Schema validation in pure vanilla Python via inspect and no Pydantic or any other dependency
    • Decoupled HTTP server rewritten in Go
    • Custom, pluggable data serialization

    This allows us to minimize the runtime logic in Python and reduce the risk of it interfering with application code. The Go server is now responsible for most of the heavy lifting:

    • HTTP server and webhooks
    • Input file download and output file upload
    • Logging

    The Go server communicates with the bare minimum Python runner via JSON files for input/output and HTTP/signals for IPC. The Python runner is solely responsible for invoking the predictor’s setup() and predict() methods.

    What do I need to change?

    Most of the Cog API, Predictor , Input , BaseModel , etc. are source compatible. There are 3 changes that might require updating the model.

    • Improved semantics of optional inputs
    • Cleaner dependencies
    • Removal of deprecated File API.

    First, ambiguous optional inputs are no longer allowed. For example, in existing Cog, declaring prompt: str suggests that it cannot be None , while it still allows default=None , which can confuse type checkers and lead to buggy code, e.g. if it doesn’t check for none-ness. For example, instead of:

    def predict(prompt: str=Input(description="prompt", default=None))
    

    We should use:

    def predict(prompt: Optional[str]=Input(description="prompt")
    

    Note that default=None is now redundant and removed, as Optional[str] implies that the input may be None , and type checker can warn us about checking it.

    Second notable change is that the new Cog runtime no longer depends on any of the Python dependencies of the existing runtime. You’ll have to add them to requirements.txt if the model relies on them and they’re not pulled in via any other third party libraries.

    • attrs
    • fastapi
    • pydantic
    • PyYAML
    • requests
    • structlog
    • typing_extensions
    • uvicorn

    Third change is the removal of deprecated cog.File API. Use cog.Path instead.

    Original source
  • Dec 19, 2025
    • Date parsed from source:
      Dec 19, 2025
    • First seen by Releasebot:
      Dec 20, 2025
    Replicate logo

    Replicate

    The little things, week ending December 19, 2025

    Web

    • Improved the reliability of google/nano-banana and google/nano-banana-pro

    • Improved accessibility when using the search bar across Replicate

    Docs

    • Added automatic llms.txt generation for documentation, making it easier for language models to discover and understand Replicate’s docs

    • Published blog post on how to run Retro Diffusion’s pixel art models on Replicate, including rd-fast, rd-plus, rd-tile, and rd-animation for generating game assets and sprites

    Original source
Releasebot

Curated by the Releasebot team

Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.

Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.

Similar to Replicate with recent updates: