Runway AI Products
All Runway AI Release Notes
- Oct 24, 2025
- Parsed from source:Oct 24, 2025
- Detected by Releasebot:Oct 30, 2025
Paid Plans
Create your own custom node-based workflows chaining together multiple models, modalities and intermediary steps for even more control of your generations. Available now.
Original source Report a problem - Oct 16, 2025
- Parsed from source:Oct 16, 2025
- Detected by Releasebot:Oct 30, 2025
ElevenLabs Voice Dubbing
ElevenLabs Dubbing is available directly via the Runway API. Translate your content into 29 languages with AI-generated speech that maintains the speaker’s original voice characteristics and emotional tone.
Original source Report a problem - Oct 16, 2025
- Parsed from source:Oct 16, 2025
- Detected by Releasebot:Oct 30, 2025
ElevenLabs Clean Audio
ElevenLabs Voice Isolation is available directly via the Runway API. Strip background noise from any recording and isolate crisp, clear speech—built for film, podcast, and interview workflows.
Original source Report a problem - Oct 15, 2025
- Parsed from source:Oct 15, 2025
- Detected by Releasebot:Oct 30, 2025
Google Veo 3.1
Google Veo 3.1
Google Veo 3.1 text to image and image to video are now available in the Runway API. Generate with even greater fidelity and control with first and last keyframe support, the new Reference to Video feature and support for full 1080p outputs. Learn more at the link below.
Original source Report a problem - Oct 8, 2025
- Parsed from source:Oct 8, 2025
- Detected by Releasebot:Oct 30, 2025
Flexible Generation Length
Flexible generation times for Runway video models are now available via the Runway API. Choose any duration from 2-10 seconds using Gen-4 Turbo. Pay only for what you generate.
Original source Report a problem - Oct 8, 2025
- Parsed from source:Oct 8, 2025
- Detected by Releasebot:Oct 30, 2025
ElevenLabs text to Sound Effects
Flexible generation times for Runway video models are now available via the Runway API. Choose any duration from 2-10 seconds using Gen-4 Turbo. Pay only for what you generate.
Original source Report a problem - October 2025
- No date parsed from source.
- Detected by Releasebot:Oct 30, 2025
Introducing Runway Gen-4
Runway Gen-4 launches a groundbreaking multi-scene AI that preserves consistent characters, objects, and environments across shots from one reference. It adds production-ready video, physics-aware world modeling, and fast GVFX, enabling seamless storytelling without extra training.
Introducing Runway Gen-4
Our next-generation series of AI models for media generation and world consistency.
A new generation of consistent and controllable media is here.
With Gen-4, you are now able to precisely generate consistent characters, locations and objects across scenes. Simply set your look and feel and the model will maintain coherent world environments while preserving the distinctive style, mood and cinematographic elements of each frame. Then, regenerate those elements from multiple perspectives and positions within your scenes.Gen-4 can utilize visual references, combined with instructions, to create new images and videos utilizing consistent styles, subjects, locations and more. Giving you unprecedented creative freedom to tell your story.
All without the need for fine-tuning or additional training.
RUNWAY GEN-4
Narrative Capabilities
A collection of short films and music videos made entirely with Gen-4 to test the model's narrative capabilities.
One simple interface, endless workflows and capabilities
WORKFLOW – CONSISTENT CHARACTERS
Infinite character consistency with a single reference image
Runway Gen-4 allows you to generate consistent characters across endless lighting conditions, locations and treatments. All with just a single reference image of your characters.WORKFLOW – CONSISTENT OBJECTS
Whatever you want, everywhere you need it
Place any object or subject in any location or condition you need. Whether you’re crafting scenes for long form narrative content or generating product photography, Runway Gen-4 makes it simple to generate consistently across environments.WORKFLOW – COVERAGE
Get every angle of any scene
To craft a scene, simply provide reference images of your subjects and describe the composition of your shot. Runway Gen-4 will do the rest.CAPABILITIES – PRODUCTION-READY VIDEO
A new standard for quality and language understanding for video generation
Gen-4 excels in its ability to generate highly dynamic videos with realistic motion as well as subject, object and style consistency with superior prompt adherence and best in class world understanding.CAPABILITIES – PHYSICS
A step towards Universal Generative Models that understand the world
Runway Gen-4 represents a significant milestone in the ability of visual generative models to simulate real world physics.WORKFLOW – GVFX
A new kind of visual effects
Original source Report a problem
Fast, controllable and flexible video generation that can seamlessly sit beside live action, animated and VFX content. - Sep 25, 2025
- Parsed from source:Sep 25, 2025
- Detected by Releasebot:Oct 30, 2025
Text to Speech
Starting today, ElevenLabs’ Multilingual v2 Text to Speech is available directly via the Runway API. Generate natural, emotionally-aware speech in 29 languages while maintaining consistent voice quality and personality.
Original source Report a problem - Sep 24, 2025
- Parsed from source:Sep 24, 2025
- Detected by Releasebot:Oct 30, 2025
API Playground
Today we’re launching the Runway API Playground—a new interactive environment that lets developers test and refine their integrations before going to production. Build with confidence and ship faster. The Runway API Playground provides a full sandbox environment with all our latest models.
Original source Report a problem - Sep 24, 2025
- Parsed from source:Sep 24, 2025
- Detected by Releasebot:Oct 30, 2025
Autoregressive-to-Diffusion Vision Language Models
Introducing A2D-VL 7B, a diffusion-based vision language model that enables fast parallel generation by adapting an autoregressive VLM to diffusion decoding. It delivers faster speed with preserved quality, much lower training compute, and KV caching support, outperforming prior diffusion VLMs on VQA benchmarks.
We adapt powerful pretrained VLMs for parallel diffusion decoding
We present a novel diffusion VLM, A2D-VL 7B (Autoregressive-to-Diffusion) for parallel generation by finetuning an existing autoregressive VLM, Qwen2.5-VL, on the diffusion language modeling task. In particular, we adopt the masked diffusion framework which "noises" tokens by masking them and "denoises" tokens by predicting the original tokens. We propose novel adaptation techniques (Fig. 2) that gradually increase the task difficulty during finetuning to smoothly transition from sequential to parallel decoding while preserving the base model's capabilities.
Further, we present novel adaptation techniques for finetuning autoregressive models into diffusion models while retaining the base model's core capabilities:
Block size annealing. The block diffusion framework enables interpolation between autoregressive and diffusion modeling: when blocks contain single tokens, we recover sequential decoding. We leverage this by gradually increasing the diffusion prediction window throughout finetuning, starting from smaller blocks and progressing to our target size of 8 tokens. This gradual progression prevents the aggressive parameter updates that would otherwise erase the base model's capabilities.
Noise level annealing. Within each token block, we apply position-dependent masking to gradually transition from easier to harder prediction tasks. Early in training, we mask the left-most tokens closest to the context more frequently (since they're easier to predict) and right-most tokens less frequently (since they're harder to predict). As training progresses, masking becomes uniform across positions, enabling any-order parallel generation within each block.
We ablate these strategies in Fig. 3, which shows the importance of both strategies for preserving benchmark performance of the base model. Concurrently, also adapts Qwen2.5 for block diffusion decoding in NLP-only tasks. In contrast, we explore vision language models and propose novel adaptation techniques that are critical for retaining model capabilities.
Our design overcomes the limitations of prior diffusion VLMs:
Efficient training. By adapting existing VLMs, our approach requires significantly less training than training diffusion VLMs from scratch. While LLaDA-V 8B trains on ≥12M visual QA pairs, our A2D-VL 7B is trained on 400K pairs.
Modern architecture. By adapting Qwen2.5-VL, we adopt their modern architectural components, such as support for native visual resolutions and multimodal positional encodings.
Improved quality in long-form responses. We use diffusion decoding in blocks of 8 tokens, which enhances both response quality and the model's ability to generate arbitrary-length outputs. Further, our training data contains 100K high-quality reasoning traces from the larger Qwen2.5-VL 72B while rely on standard instruction-tuning data, with also distilling from a 7B math/science reasoning model. To enhance response flexibility, we also include 50K samples from MAmmoTH-VL in our data mixture similar to.
KV caching support. Under the block diffusion training and inference framework, A2D-VL sequentially generates a block of tokens at a time using block-causal attention (attending only to previous blocks and tokens within the current block) rather than fully bidirectional attention. As a result, A2D-VL supports exact KV caching of previously generated blocks instead of relying on approximate methods.
We strike a new balance between speed and performance
By adapting pretrained autoregressive VLMs for diffusion, A2D-VL strikes an improved balance between inference speed and downstream performance. We compare the speed-quality trade-off between A2D-VL 7B, Qwen2.5-VL 7B, and the diffusion VLM LLaDA-V 7B. For LLaDA-V, we follow the recommended settings: approximate KV caching with recomputation every 32 steps and "factor"-based confidence thresholding.
Detailed image captioning
We generate detailed image captions (≤ 512 tokens) and, similarly to, score them against captions generated by GPT-4o, GPT-4V, and Gemini-1.5-Pro using BERTScore to measure semantic similarity. Captions generated by A2D-VL achieve greater consistency with the reference captions compared to prior diffusion VLM LLaDA-V.
Chain-of-thought reasoning
A2D-VL consistently achieves better MMMU-Pro accuracy with chain-of-thought prompting compared to LLaDA-V. For Qwen2.5 and A2D-VL, we generate up to 16k tokens. For LLaDA-V, we limit the response to 512 tokens as the accuracy degrades at longer output lengths.
General visual understanding
A2D-VL outperforms prior diffusion VLMs on 3 out of 5 visual question-answering benchmarks with minimal performance degradation relative to the base Qwen model.
Conclusion
We introduce Autoregressive-to-Diffusion (A2D) vision language models for faster, parallel generation by adapting existing autoregressive VLMs to diffusion decoding. A2D-VL outperforms prior diffusion VLMs in visual question-answering while requiring significantly less training compute. Our novel adaptation techniques are critical for retaining model capabilities, finally enabling the conversion of state-of-the-art autoregressive VLMs to diffusion with minimal impact to quality.
Original source Report a problem