Name: Runway AI
Brand: Runway AI

Oct 24, 2025

Parsed from source:
Oct 24, 2025
Detected by Releasebot:
Oct 30, 2025

Paid Plans

Create your own custom node-based workflows chaining together multiple models, modalities and intermediary steps for even more control of your generations. Available now.

Original source Report a problem

Oct 16, 2025

Parsed from source:
Oct 16, 2025
Detected by Releasebot:
Oct 30, 2025

Runway API by Runway AI

ElevenLabs Voice Dubbing

ElevenLabs Dubbing is available directly via the Runway API. Translate your content into 29 languages with AI-generated speech that maintains the speaker’s original voice characteristics and emotional tone.

Original source Report a problem

Oct 16, 2025

Parsed from source:
Oct 16, 2025
Detected by Releasebot:
Oct 30, 2025

Runway API by Runway AI

ElevenLabs Clean Audio

ElevenLabs Voice Isolation is available directly via the Runway API. Strip background noise from any recording and isolate crisp, clear speech—built for film, podcast, and interview workflows.

Original source Report a problem

Oct 15, 2025

Parsed from source:
Oct 15, 2025
Detected by Releasebot:
Oct 30, 2025

Runway API by Runway AI

Google Veo 3.1

Google Veo 3.1 text to image and image to video are now available in the Runway API. Generate with even greater fidelity and control with first and last keyframe support, the new Reference to Video feature and support for full 1080p outputs. Learn more at the link below.

Original source Report a problem

Oct 8, 2025

Parsed from source:
Oct 8, 2025
Detected by Releasebot:
Oct 30, 2025

Runway API by Runway AI

Flexible Generation Length

Flexible generation times for Runway video models are now available via the Runway API. Choose any duration from 2-10 seconds using Gen-4 Turbo. Pay only for what you generate.

Original source Report a problem

Oct 8, 2025

Parsed from source:
Oct 8, 2025
Detected by Releasebot:
Oct 30, 2025

Runway API by Runway AI

ElevenLabs text to Sound Effects

Flexible generation times for Runway video models are now available via the Runway API. Choose any duration from 2-10 seconds using Gen-4 Turbo. Pay only for what you generate.

Original source Report a problem

October 2025

No date parsed from source.
Detected by Releasebot:
Oct 30, 2025

Runway AI

Introducing Runway Gen-4

Runway Gen-4 launches a groundbreaking multi-scene AI that preserves consistent characters, objects, and environments across shots from one reference. It adds production-ready video, physics-aware world modeling, and fast GVFX, enabling seamless storytelling without extra training.

Introducing Runway Gen-4

Our next-generation series of AI models for media generation and world consistency.

A new generation of consistent and controllable media is here.
With Gen-4, you are now able to precisely generate consistent characters, locations and objects across scenes. Simply set your look and feel and the model will maintain coherent world environments while preserving the distinctive style, mood and cinematographic elements of each frame. Then, regenerate those elements from multiple perspectives and positions within your scenes.

Gen-4 can utilize visual references, combined with instructions, to create new images and videos utilizing consistent styles, subjects, locations and more. Giving you unprecedented creative freedom to tell your story.

All without the need for fine-tuning or additional training.

RUNWAY GEN-4

Narrative Capabilities

A collection of short films and music videos made entirely with Gen-4 to test the model's narrative capabilities.

One simple interface, endless workflows and capabilities

WORKFLOW – CONSISTENT CHARACTERS

Infinite character consistency with a single reference image
Runway Gen-4 allows you to generate consistent characters across endless lighting conditions, locations and treatments. All with just a single reference image of your characters.

WORKFLOW – CONSISTENT OBJECTS

Whatever you want, everywhere you need it
Place any object or subject in any location or condition you need. Whether you’re crafting scenes for long form narrative content or generating product photography, Runway Gen-4 makes it simple to generate consistently across environments.

WORKFLOW – COVERAGE

Get every angle of any scene
To craft a scene, simply provide reference images of your subjects and describe the composition of your shot. Runway Gen-4 will do the rest.

CAPABILITIES – PRODUCTION-READY VIDEO

A new standard for quality and language understanding for video generation
Gen-4 excels in its ability to generate highly dynamic videos with realistic motion as well as subject, object and style consistency with superior prompt adherence and best in class world understanding.

CAPABILITIES – PHYSICS

A step towards Universal Generative Models that understand the world
Runway Gen-4 represents a significant milestone in the ability of visual generative models to simulate real world physics.

WORKFLOW – GVFX

A new kind of visual effects
Fast, controllable and flexible video generation that can seamlessly sit beside live action, animated and VFX content.

Original source Report a problem

Sep 25, 2025

Parsed from source:
Sep 25, 2025
Detected by Releasebot:
Oct 30, 2025

Runway API by Runway AI

Text to Speech

Starting today, ElevenLabs’ Multilingual v2 Text to Speech is available directly via the Runway API. Generate natural, emotionally-aware speech in 29 languages while maintaining consistent voice quality and personality.

Original source Report a problem

Sep 24, 2025

Parsed from source:
Sep 24, 2025
Detected by Releasebot:
Oct 30, 2025

Runway API by Runway AI

API Playground

Today we’re launching the Runway API Playground—a new interactive environment that lets developers test and refine their integrations before going to production. Build with confidence and ship faster. The Runway API Playground provides a full sandbox environment with all our latest models.

Original source Report a problem

Sep 24, 2025

Parsed from source:
Sep 24, 2025
Detected by Releasebot:
Oct 30, 2025

Runway AI

Autoregressive-to-Diffusion Vision Language Models

Introducing A2D-VL 7B, a diffusion-based vision language model that enables fast parallel generation by adapting an autoregressive VLM to diffusion decoding. It delivers faster speed with preserved quality, much lower training compute, and KV caching support, outperforming prior diffusion VLMs on VQA benchmarks.

We adapt powerful pretrained VLMs for parallel diffusion decoding

We present a novel diffusion VLM, A2D-VL 7B (Autoregressive-to-Diffusion) for parallel generation by finetuning an existing autoregressive VLM, Qwen2.5-VL, on the diffusion language modeling task. In particular, we adopt the masked diffusion framework which "noises" tokens by masking them and "denoises" tokens by predicting the original tokens. We propose novel adaptation techniques (Fig. 2) that gradually increase the task difficulty during finetuning to smoothly transition from sequential to parallel decoding while preserving the base model's capabilities.

Further, we present novel adaptation techniques for finetuning autoregressive models into diffusion models while retaining the base model's core capabilities:

Block size annealing. The block diffusion framework enables interpolation between autoregressive and diffusion modeling: when blocks contain single tokens, we recover sequential decoding. We leverage this by gradually increasing the diffusion prediction window throughout finetuning, starting from smaller blocks and progressing to our target size of 8 tokens. This gradual progression prevents the aggressive parameter updates that would otherwise erase the base model's capabilities.
Noise level annealing. Within each token block, we apply position-dependent masking to gradually transition from easier to harder prediction tasks. Early in training, we mask the left-most tokens closest to the context more frequently (since they're easier to predict) and right-most tokens less frequently (since they're harder to predict). As training progresses, masking becomes uniform across positions, enabling any-order parallel generation within each block.

We ablate these strategies in Fig. 3, which shows the importance of both strategies for preserving benchmark performance of the base model. Concurrently, also adapts Qwen2.5 for block diffusion decoding in NLP-only tasks. In contrast, we explore vision language models and propose novel adaptation techniques that are critical for retaining model capabilities.

Our design overcomes the limitations of prior diffusion VLMs:

Efficient training. By adapting existing VLMs, our approach requires significantly less training than training diffusion VLMs from scratch. While LLaDA-V 8B trains on ≥12M visual QA pairs, our A2D-VL 7B is trained on 400K pairs.
Modern architecture. By adapting Qwen2.5-VL, we adopt their modern architectural components, such as support for native visual resolutions and multimodal positional encodings.
Improved quality in long-form responses. We use diffusion decoding in blocks of 8 tokens, which enhances both response quality and the model's ability to generate arbitrary-length outputs. Further, our training data contains 100K high-quality reasoning traces from the larger Qwen2.5-VL 72B while rely on standard instruction-tuning data, with also distilling from a 7B math/science reasoning model. To enhance response flexibility, we also include 50K samples from MAmmoTH-VL in our data mixture similar to.
KV caching support. Under the block diffusion training and inference framework, A2D-VL sequentially generates a block of tokens at a time using block-causal attention (attending only to previous blocks and tokens within the current block) rather than fully bidirectional attention. As a result, A2D-VL supports exact KV caching of previously generated blocks instead of relying on approximate methods.

We strike a new balance between speed and performance

By adapting pretrained autoregressive VLMs for diffusion, A2D-VL strikes an improved balance between inference speed and downstream performance. We compare the speed-quality trade-off between A2D-VL 7B, Qwen2.5-VL 7B, and the diffusion VLM LLaDA-V 7B. For LLaDA-V, we follow the recommended settings: approximate KV caching with recomputation every 32 steps and "factor"-based confidence thresholding.

Detailed image captioning

We generate detailed image captions (≤ 512 tokens) and, similarly to, score them against captions generated by GPT-4o, GPT-4V, and Gemini-1.5-Pro using BERTScore to measure semantic similarity. Captions generated by A2D-VL achieve greater consistency with the reference captions compared to prior diffusion VLM LLaDA-V.

Chain-of-thought reasoning

A2D-VL consistently achieves better MMMU-Pro accuracy with chain-of-thought prompting compared to LLaDA-V. For Qwen2.5 and A2D-VL, we generate up to 16k tokens. For LLaDA-V, we limit the response to 512 tokens as the accuracy degrades at longer output lengths.

General visual understanding

A2D-VL outperforms prior diffusion VLMs on 3 out of 5 visual question-answering benchmarks with minimal performance degradation relative to the base Qwen model.

Conclusion

We introduce Autoregressive-to-Diffusion (A2D) vision language models for faster, parallel generation by adapting existing autoregressive VLMs to diffusion decoding. A2D-VL outperforms prior diffusion VLMs in visual question-answering while requiring significantly less training compute. Our novel adaptation techniques are critical for retaining model capabilities, finally enabling the conversion of state-of-the-art autoregressive VLMs to diffusion with minimal impact to quality.

Original source Report a problem

Runway AI Release Notes

Runway AI Products

All Runway AI Release Notes

Paid Plans

ElevenLabs Voice Dubbing

ElevenLabs Clean Audio

Google Veo 3.1

Google Veo 3.1

Flexible Generation Length

ElevenLabs text to Sound Effects

Introducing Runway Gen-4

Introducing Runway Gen-4

RUNWAY GEN-4

Narrative Capabilities

One simple interface, endless workflows and capabilities

WORKFLOW – CONSISTENT CHARACTERS

WORKFLOW – CONSISTENT OBJECTS

WORKFLOW – COVERAGE

CAPABILITIES – PRODUCTION-READY VIDEO

CAPABILITIES – PHYSICS

WORKFLOW – GVFX

Text to Speech

API Playground

Autoregressive-to-Diffusion Vision Language Models

We adapt powerful pretrained VLMs for parallel diffusion decoding

We strike a new balance between speed and performance

Detailed image captioning

Chain-of-thought reasoning

General visual understanding

Conclusion

Related vendors