- Oct 24, 2025
- Parsed from source:Oct 24, 2025
- Detected by Releasebot:Oct 30, 2025
Paid Plans
Create your own custom node-based workflows chaining together multiple models, modalities and intermediary steps for even more control of your generations. Available now.
Original source Report a problem - October 2025
- No date parsed from source.
- Detected by Releasebot:Oct 30, 2025
Introducing Runway Gen-4
Runway Gen-4 launches a groundbreaking multi-scene AI that preserves consistent characters, objects, and environments across shots from one reference. It adds production-ready video, physics-aware world modeling, and fast GVFX, enabling seamless storytelling without extra training.
Introducing Runway Gen-4
Our next-generation series of AI models for media generation and world consistency.
A new generation of consistent and controllable media is here.
With Gen-4, you are now able to precisely generate consistent characters, locations and objects across scenes. Simply set your look and feel and the model will maintain coherent world environments while preserving the distinctive style, mood and cinematographic elements of each frame. Then, regenerate those elements from multiple perspectives and positions within your scenes.Gen-4 can utilize visual references, combined with instructions, to create new images and videos utilizing consistent styles, subjects, locations and more. Giving you unprecedented creative freedom to tell your story.
All without the need for fine-tuning or additional training.
RUNWAY GEN-4
Narrative Capabilities
A collection of short films and music videos made entirely with Gen-4 to test the model's narrative capabilities.
One simple interface, endless workflows and capabilities
WORKFLOW – CONSISTENT CHARACTERS
Infinite character consistency with a single reference image
Runway Gen-4 allows you to generate consistent characters across endless lighting conditions, locations and treatments. All with just a single reference image of your characters.WORKFLOW – CONSISTENT OBJECTS
Whatever you want, everywhere you need it
Place any object or subject in any location or condition you need. Whether you’re crafting scenes for long form narrative content or generating product photography, Runway Gen-4 makes it simple to generate consistently across environments.WORKFLOW – COVERAGE
Get every angle of any scene
To craft a scene, simply provide reference images of your subjects and describe the composition of your shot. Runway Gen-4 will do the rest.CAPABILITIES – PRODUCTION-READY VIDEO
A new standard for quality and language understanding for video generation
Gen-4 excels in its ability to generate highly dynamic videos with realistic motion as well as subject, object and style consistency with superior prompt adherence and best in class world understanding.CAPABILITIES – PHYSICS
A step towards Universal Generative Models that understand the world
Runway Gen-4 represents a significant milestone in the ability of visual generative models to simulate real world physics.WORKFLOW – GVFX
A new kind of visual effects
Original source Report a problem
Fast, controllable and flexible video generation that can seamlessly sit beside live action, animated and VFX content. - Sep 24, 2025
- Parsed from source:Sep 24, 2025
- Detected by Releasebot:Oct 30, 2025
Autoregressive-to-Diffusion Vision Language Models
Introducing A2D-VL 7B, a diffusion-based vision language model that enables fast parallel generation by adapting an autoregressive VLM to diffusion decoding. It delivers faster speed with preserved quality, much lower training compute, and KV caching support, outperforming prior diffusion VLMs on VQA benchmarks.
We adapt powerful pretrained VLMs for parallel diffusion decoding
We present a novel diffusion VLM, A2D-VL 7B (Autoregressive-to-Diffusion) for parallel generation by finetuning an existing autoregressive VLM, Qwen2.5-VL, on the diffusion language modeling task. In particular, we adopt the masked diffusion framework which "noises" tokens by masking them and "denoises" tokens by predicting the original tokens. We propose novel adaptation techniques (Fig. 2) that gradually increase the task difficulty during finetuning to smoothly transition from sequential to parallel decoding while preserving the base model's capabilities.
Further, we present novel adaptation techniques for finetuning autoregressive models into diffusion models while retaining the base model's core capabilities:
Block size annealing. The block diffusion framework enables interpolation between autoregressive and diffusion modeling: when blocks contain single tokens, we recover sequential decoding. We leverage this by gradually increasing the diffusion prediction window throughout finetuning, starting from smaller blocks and progressing to our target size of 8 tokens. This gradual progression prevents the aggressive parameter updates that would otherwise erase the base model's capabilities.
Noise level annealing. Within each token block, we apply position-dependent masking to gradually transition from easier to harder prediction tasks. Early in training, we mask the left-most tokens closest to the context more frequently (since they're easier to predict) and right-most tokens less frequently (since they're harder to predict). As training progresses, masking becomes uniform across positions, enabling any-order parallel generation within each block.
We ablate these strategies in Fig. 3, which shows the importance of both strategies for preserving benchmark performance of the base model. Concurrently, also adapts Qwen2.5 for block diffusion decoding in NLP-only tasks. In contrast, we explore vision language models and propose novel adaptation techniques that are critical for retaining model capabilities.
Our design overcomes the limitations of prior diffusion VLMs:
Efficient training. By adapting existing VLMs, our approach requires significantly less training than training diffusion VLMs from scratch. While LLaDA-V 8B trains on ≥12M visual QA pairs, our A2D-VL 7B is trained on 400K pairs.
Modern architecture. By adapting Qwen2.5-VL, we adopt their modern architectural components, such as support for native visual resolutions and multimodal positional encodings.
Improved quality in long-form responses. We use diffusion decoding in blocks of 8 tokens, which enhances both response quality and the model's ability to generate arbitrary-length outputs. Further, our training data contains 100K high-quality reasoning traces from the larger Qwen2.5-VL 72B while rely on standard instruction-tuning data, with also distilling from a 7B math/science reasoning model. To enhance response flexibility, we also include 50K samples from MAmmoTH-VL in our data mixture similar to.
KV caching support. Under the block diffusion training and inference framework, A2D-VL sequentially generates a block of tokens at a time using block-causal attention (attending only to previous blocks and tokens within the current block) rather than fully bidirectional attention. As a result, A2D-VL supports exact KV caching of previously generated blocks instead of relying on approximate methods.
We strike a new balance between speed and performance
By adapting pretrained autoregressive VLMs for diffusion, A2D-VL strikes an improved balance between inference speed and downstream performance. We compare the speed-quality trade-off between A2D-VL 7B, Qwen2.5-VL 7B, and the diffusion VLM LLaDA-V 7B. For LLaDA-V, we follow the recommended settings: approximate KV caching with recomputation every 32 steps and "factor"-based confidence thresholding.
Detailed image captioning
We generate detailed image captions (≤ 512 tokens) and, similarly to, score them against captions generated by GPT-4o, GPT-4V, and Gemini-1.5-Pro using BERTScore to measure semantic similarity. Captions generated by A2D-VL achieve greater consistency with the reference captions compared to prior diffusion VLM LLaDA-V.
Chain-of-thought reasoning
A2D-VL consistently achieves better MMMU-Pro accuracy with chain-of-thought prompting compared to LLaDA-V. For Qwen2.5 and A2D-VL, we generate up to 16k tokens. For LLaDA-V, we limit the response to 512 tokens as the accuracy degrades at longer output lengths.
General visual understanding
A2D-VL outperforms prior diffusion VLMs on 3 out of 5 visual question-answering benchmarks with minimal performance degradation relative to the base Qwen model.
Conclusion
We introduce Autoregressive-to-Diffusion (A2D) vision language models for faster, parallel generation by adapting existing autoregressive VLMs to diffusion decoding. A2D-VL outperforms prior diffusion VLMs in visual question-answering while requiring significantly less training compute. Our novel adaptation techniques are critical for retaining model capabilities, finally enabling the conversion of state-of-the-art autoregressive VLMs to diffusion with minimal impact to quality.
Original source Report a problem - Sep 11, 2025
- Parsed from source:Sep 11, 2025
- Detected by Releasebot:Oct 30, 2025
All Plans
Light Mode now available in your settings. Update your theme from system settings to Light or Dark Mode for a personalized app experience.
Original source Report a problem - Aug 21, 2025
- Parsed from source:Aug 21, 2025
- Detected by Releasebot:Oct 30, 2025
Introducing Runway Game Worlds
Runway launches Game Worlds Beta, an AI-driven, real-time non-linear narrative platform with preset worlds and a Create Your Own option. Play in Comic or Chat mode as it evolves toward richer visuals and world models.
Introducing Runway Game Worlds
An early look at the next frontier of non-linear narrative experiences.
Over the last few months, we have been working on research and products that are moving us closer toward a future where you will be able to explore any character, story or world in real time.
While generating the pixels of these experiences is one aspect of this new frontier, another is the need for novel mechanics and interfaces. From how stories unfold to how your choices affect the worlds you’re simulating.
Today’s beta release marks a first step in this direction, learn more below.
The Runway Game Worlds Beta is an early exploration of this research that employs novel uses of AI for non-linear narrative experiences. All generated in real-time with personalized stories, characters and multi-modal media generation.
At launch, you’ll be able to play from a series of preset Game Worlds or create a Game World of your own.
Preset Game Worlds
Each preset game is designed to highlight different genres and gameplay mechanics you can explore with the Game Worlds Beta.
- The Last Score is a heist game that requires you to pull off a job within seemingly impossible constraints.
- Athena Springs is a tense mystery that requires you to search the world for answers.
- And The Gallic Storm is an interactive history game that challenges you to learn real history via a gamified narrative.
Create Your Own Game World
You can also create your own custom games exploring any idea, story, character or world you can imagine.
From standard genres and gameplay mechanics to experimental narrative experiences, custom games offer you a unique creative sandbox. Custom games are private by default, but can be made public for anyone to play or share.
As you play through a Game World, images will be generated to complement your experience as it unfolds. You can either play in traditional Chat Mode or in the more visually driven Comic Mode.
Left: Game Worlds Comic Mode | Right: Game Worlds Chat Mode
Game Worlds represents a first step towards the next era of gaming. Where worlds, characters and storylines aren’t rendered, modeled or scripted, but are imagined, formed and generated as you play.
We believe that interactive worlds are the next frontier. For entertainment, for education, for any experience you can imagine.
As we continue to build upon and improve our General World Models in the weeks and months to come, Game Worlds will continue to evolve alongside them. Beginning to integrate more visually rich experience and real-time video generation.
Register Now
Original source Report a problem - Aug 20, 2025
- Parsed from source:Aug 20, 2025
- Detected by Releasebot:Oct 30, 2025
Paid Plans
Change the voice of your performances directly from within the Act-Two interface, giving you more creative flexibility when generating character performances.
Original source Report a problem - Aug 19, 2025
- Parsed from source:Aug 19, 2025
- Detected by Releasebot:Oct 30, 2025
All Plans
Added support for select third party models within Chat Mode, and Gen-4 Image Turbo, is now available to all users.
Original source Report a problem - Jul 31, 2025
- Parsed from source:Jul 31, 2025
- Detected by Releasebot:Oct 30, 2025
Paid Plans
Aleph is now available for all paid plans. A new way to edit, transform and generate video.
Original source Report a problem - Jul 25, 2025
- Parsed from source:Jul 25, 2025
- Detected by Releasebot:Oct 30, 2025
Introducing Runway Aleph
Introducing Runway Aleph
A new way to edit, transform and generate video.
Runway Aleph is a state-of-the-art in-context video model, setting a new frontier for multi-task visual generation, with the ability to perform a wide range of edits on an input video such as adding, removing, and transforming objects, generating any angle of a scene, and modifying style and lighting, among many other tasks.
How to Use Runway Aleph
Runway Aleph represents a novel step forward for multi-task video generation and manipulation.
Aleph is now available for all paid users.
Original source Report a problem - Jun 25, 2025
- Parsed from source:Jun 25, 2025
- Detected by Releasebot:Oct 30, 2025
All Plans
An updated version of Gen-4 References that significantly improves object consistency and prompt adherence. Available for all users.
Original source Report a problem