diffusers Updates & Release Notes

Name: diffusers
Brand: Hugging Face

Last updated: May 1, 2026

Get this feed:

AI/ML Infrastructure

May 1, 2026
- Date parsed from source:
  May 1, 2026
- First seen by Releasebot:
  May 1, 2026
diffusers by Hugging Face

Diffusers 0.38.0: New image and audio pipelines, Core library improvements, and more

diffusers releases a major expansion with new pipelines for LLaDA2, NucleusMoE, ERNIE-Image, LongCat-AudioDiT, and ACE-Step, plus FLUX.2 decoder and inpaint support, modular pipeline updates, and core performance and attention backend improvements.

New Pipelines

LLaDA2

LLaDA2 is a family of discrete diffusion language models that generate text through block-wise iterative refinement. Instead of autoregressive token-by-token generation, LLaDA2 starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement steps.

PR: #13226

Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/llada2

Nucleus-MoE

NucleusMoE-Image is a 2B active 17B parameter model trained with efficiency at its core. Our novel architecture highlights the scalability of a sparse MoE architecture for Image generation.

PR: #13317

Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/nucleusmoe_image

Thanks to @sippycoder for the contribution.

Ernie-Image

ERNIE-Image is a powerful and highly efficient image generation model with 8B parameters.

PR: #13432

Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/ernie_image

Thanks to @HsiaWinter for the contribution.

LongCat-AudioDiT

LongCat-AudioDiT is a text-to-audio diffusion model from Meituan LongCat.

PR: #13483

Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/longcat_audio_dit

Thanks to @RuixiangMa for the contribution.

Ace-Step 1.5

ACE-Step 1.5 generates variable-length stereo audio at 48 kHz (10 seconds to 10 minutes) from text prompts and optional lyrics. The full system pairs a Language Model planner with a Diffusion Transformer (DiT) synthesizer; this pipeline wraps the DiT half of that stack, and consists of three components: an AutoencoderOobleck VAE that compresses waveforms into 25 Hz stereo latents, a Qwen3-based text encoder for prompt and lyric conditioning, and an AceStepTransformer1DModel DiT that operates in the VAE latent space using flow matching.

PR: #13095

Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/ace_step

Thanks to @ChuxiJ for the contribution.

Flux.2 Small Decoder

Make your Flux.2 decoding faster with this new small decoder model from the Black Forest Labs. You can check it out here. It was contributed by @huemin-art in this PR.

Modular Pipeline Support

We added modular support for LTX-2 and Hunyuan 1.5.

Core Library

Flash Attention 4 backend

FlashPack loading

Group offloading + TorchAO

ring_anything as a new CP backend

Profiling pipelines in Diffusers

All commits

[Discrete Diffusion] Add LLaDA2 pipeline by @kashif in #13226

[LLADA2] documentation fixes by @kashif in #13333

[ci] claude in ci. by @sayakpaul in #13297

[docs] kernels by @stevhliu in #13139

[tests] Tests for conditional pipeline blocks by @sayakpaul in #13247

avoid hardcode device in flux-control example by @kaixuanliu in #13336

fix claude workflow to include id-token with write. by @sayakpaul in #13338

Update LTX-2 Docs to Cover LTX-2.3 Models by @dg845 in #13337

remove str option for quantization config in torchao by @howardzhang-cv in #13291

[ci] include checkout step in claude review workflow by @sayakpaul in #13352

change minimum version guard for torchao to 0.15.0 by @howardzhang-cv in #13355

[ci] move to assert instead of self.Assert* by @sayakpaul in #13366

[docs] refactor model skill by @stevhliu in #13334

Fix Ulysses SP backward with SDPA by @zhtmike in #13328

Add train flux2 series lora config by @tcaimm in #13011

[docs] Add NeMo Automodel training guide by @pthombre in #13306

Fix: ensure consistent dtype and eval mode in pipeline save/load tests by @YangKai0616 in #13339

[ci] support claude reviewing on forks. by @sayakpaul in #13365

Fix MotionConv2d to cast blur_kernel to input dtype instead of reverse by @YangKai0616 in #13364

chore: update claude_review.yml by @hf-security-analysis[bot] in #13374

corrects single file path validation logic by @andrew-w-ross in #13363

[docs] deprecate pipelines by @stevhliu in #13157

🔒 Pin GitHub Actions to commit SHAs by @paulinebm in #13385

[docs] add auto docstring and parameter templates documentation for m… by @yiyixuxu in #13382

Fix typos and grammar errors in documentation by @GalacticAvenger in #13391

fix(ddim): validate eta is in [0, 1] in DDIMPipeline by @NIK-TIGER-BILL in #13367

Fix Dynamo lru_cache warnings during torch.compile by @jiqing-feng in #13384

[tests] refactor wan autoencoder tests by @sayakpaul in #13371

NucleusMoE-Image by @sippycoder in #13317

Add examples on how to profile a pipeline by @sayakpaul in #13356

Update README.md of the profiling guide by @sayakpaul in #13400

[CI] Refactor Cosmos Transformer Tests by @DN6 in #13335

[tests] refactor autoencoderdc tests by @sayakpaul in #13369

[CI] Hunyuan Transformer Tests Refactor by @DN6 in #13342

Fix VAE offload encode device mismatch in DreamBooth scripts by @azolotenkov in #13417

Remove references to torchao's AffineQuantizedTensor by @andrewor14 in #13405

[tests] fix autoencoderdc tests by @sayakpaul in #13424

[core] fix group offloading when using torchao by @sayakpaul in #13276

Fix IndexError in HunyuanVideo I2V pipeline by @kaixuanliu in #13244

improve Claude CI by @yiyixuxu in #13397

FLUX.2 small decoder by @huemin-art in #13428

[CI] Add PR/Issue Auto Labeler by @DN6 in #13380

[CI] Add GLM Image Transformer Model Tests by @DN6 in #13344

[CI] Use finegrained token for Issue Labeler by @DN6 in #13433

Handle prompt embedding concat in Qwen dreambooth example by @chenyangzhu1 in #13387

fix(qwen-image dreambooth): correct prompt embed repeats when using --with_prior_preservation by @chenyangzhu1 in #13396

Cache RoPE freqs on device to avoid repeated CPU-GPU copy in QwenImage by @akshan-main in #13406

[tests] tighten dependency testing. by @sayakpaul in #13332

Fix grammar in LoRA documentation by @Xyc2016 in #13423

Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution i… by @akshan-main in #13440

[modular] Add LTX Video modular pipeline by @akshan-main in #13378

Add ernie image by @HsiaWinter in #13432

[core] fix fa4 integration by @sayakpaul in #13443

FlashPack by @hlky in #12700

[ptxla] fix pytorch xla inference on TPUs. by @entrpn in #13463

fix some dtype issue for gguf / some gpu backends by @HsiaWinter in #13464

Fix Qwen Image DreamBooth prior preservation batch ordering by @azolotenkov in #13441

[tests] fix deprecated attention processor testing. by @sayakpaul in #13469

[tests] xfail clip related issues. by @sayakpaul in #13454

[agent] add modular doc by @yiyixuxu in #13410

[tests] fix training tests by @sayakpaul in #13442

fix(profiling): preserve instance isolation when decorating methods by @Akash504-ai in #13471

[Feat] Adds LongCat-AudioDiT pipeline by @RuixiangMa in #13390

Fix Flux2 DreamBooth prior preservation prompt repeats by @azolotenkov in #13415

chore: bump doc-builder SHA for PR upload workflow by @rtrompier in #13476

Remove compile bottlenecks from ZImage pipeline by @hitchhiker3010 in #13461

[chore] Add diffusers-format example to LongCatAudioDiTPipeline by @RuixiangMa in #13483

[core] fix autoencoderkl qwenimage for xla by @sayakpaul in #13480

add PR fork workable by @paulinebm in #13438

Add modular pipeline for HunyuanVideo 1.5 by @akshan-main in #13389

[agents docs] add float64 gotcha by @yiyixuxu in #13472

fix(ernie-image): avoid locals() comprehension scope issue in callback kwargs by @songh11 in #13478

[Bugfix] Fix shape mismatch in LongCatAudioDiTTransformer conversion by @RuixiangMa in #13494

feat: bump safetensors to 0.8.0-rc.0 by @McPatate in #13470

fix(qwen): fix CFG failing when passing neg prompt embeds with none mask by @Sunhill666 in #13379

add an example of spmd for flux on v5e-8 by @sayakpaul in #13474

Add FLUX.2 Klein Inpaint Pipeline by @adi776borate in #13050

[docs] add a mention of torchao and other backends in speed memory docs. by @sayakpaul in #13499

Fix Flux2 non-diffusers guidance LoRA conversion by @yadferhad in #13486

add _native_npu_attention support mask shape like [B,1,1,S] by @chang-zhijie in #13490

fix(freeu): run FFT in float32 for float16 inputs to avoid ComplexHalf by @Ricardo-M-L in #13503

Fix non-deterministic T5 outputs in HiDream pipeline tests by @kaixuanliu in #13534

Fix AuraFlow attn processors applying norm_added_q to key projection by @Ricardo-M-L in #13533

add _repeated_blocks for ErnieImageTransformer2DModel by @kaixuanliu in #13496

[CI] Fix BnB tests by @DN6 in #13481

[tests] fix group offloading with disk tests by @sayakpaul in #13491

[ci] feat: have pr labeler label for closing issues. by @sayakpaul in #13548

Improve trust_remote_code by @hlky in #13448

chore: bump doc-builder SHA for main doc build workflow by @rtrompier in #13555

[ci] simplify release workflow. by @sayakpaul in #13329

[attention backends] fix ring CP for flash and flash 3 by @sayakpaul in #13182

[agents docs] add pipelines.md etc by @yiyixuxu in #13567

Add Ernie-Image modular pipeline by @akshan-main in #13498

[agents docs] update modular.md by @yiyixuxu in #13568

[docs] fix typo in AutoencoderOobleck docs by @ivnvalex in #13642)

Fix ErnieImagePipeline pre-computed prompt_embeds + num_images_per_prompt shape mismatch by @Ricardo-M-L in #13532

feat: support ring attention with arbitrary KV sequence lengths by @songh11 in #13545

[ci] use tokenizers stable installtion in CI. by @sayakpaul in #13562

NucleusMoE docs by @sayakpaul in #13661

Fix UniPC scheduler device mismatch when using offloading by @ParamChordiya in #13489

[Ernie-Image] Add lora support by @asomoza in #13575

Add ACE-Step pipeline for text-to-music generation by @ChuxiJ in #13095

Fix missing latents_bn_std dtype cast in VAE normalization by @adi776borate in #13299

Release: v0.38.0-release by @sayakpaul (direct commit on v0.38.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@kashif

[Discrete Diffusion] Add LLaDA2 pipeline (#13226)

[LLADA2] documentation fixes (#13333)

@howardzhang-cv

remove str option for quantization config in torchao (#13291)

change minimum version guard for torchao to 0.15.0 (#13355)

@sippycoder

NucleusMoE-Image (#13317)

@DN6

[CI] Refactor Cosmos Transformer Tests (#13335)

[CI] Hunyuan Transformer Tests Refactor (#13342)

[CI] Add PR/Issue Auto Labeler (#13380)

[CI] Add GLM Image Transformer Model Tests (#13344)

[CI] Use finegrained token for Issue Labeler (#13433)

[CI] Fix BnB tests (#13481)

@akshan-main

Cache RoPE freqs on device to avoid repeated CPU-GPU copy in QwenImage (#13406)

Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution i… (#13440)

[modular] Add LTX Video modular pipeline (#13378)

Add modular pipeline for HunyuanVideo 1.5 (#13389)

Add Ernie-Image modular pipeline (#13498)

@HsiaWinter

Add ernie image (#13432)

fix some dtype issue for gguf / some gpu backends (#13464)

@hlky

FlashPack (#12700)

Improve trust_remote_code (#13448)

@RuixiangMa

[Feat] Adds LongCat-AudioDiT pipeline (#13390)

[chore] Add diffusers-format example to LongCatAudioDiTPipeline (#13483)

[Bugfix] Fix shape mismatch in LongCatAudioDiTTransformer conversion (#13494)

@adi776borate

Add FLUX.2 Klein Inpaint Pipeline (#13050)

Fix missing latents_bn_std dtype cast in VAE normalization (#13299)

@ChuxiJ

Add ACE-Step pipeline for text-to-music generation (#13095)
Original source
Mar 25, 2026
- Date parsed from source:
  Mar 25, 2026
- First seen by Releasebot:
  Mar 25, 2026
diffusers by Hugging Face

Fixes for AutoModel type hints in Modular Pipelines and Flux Klein LoRA loading

diffusers fixes ModularPipelines AutoModel loading, Flux Klein LoRA loading, and an unguarded torchvision import.
Fix for loading ModularPipelines with AutoModel type hints in their modular_model_index.json #13271

Fix Flux Klein LoRA loading #13313

Fix unguarded torchvision import in Cosmos Predict 2.5 #13321

Original source
All of your release notes in one feed

Join Releasebot and get updates from Hugging Face and hundreds of other software products.

Create account
Get updates with:
Mar 5, 2026
- Date parsed from source:
  Mar 5, 2026
- First seen by Releasebot:
  Mar 20, 2026
diffusers by Hugging Face

Diffusers 0.37.0: Modular Diffusers, New image and video pipelines, multiple core library improvements, and more 🔥

diffusers releases Modular Diffusers for building pipelines from reusable blocks, while expanding image, video, and audio generation with new models like Z-Image, Flux2 Klein, Qwen Image Layered, LTX-2, and Helios. It also adds core caching, context parallelism, and broad bug fixes.
Modular Diffusers

Modular Diffusers introduces a new way to build diffusion pipelines by composing reusable blocks. Instead of writing entire pipelines from scratch, you can now mix and match building blocks to create custom workflows tailored to your specific needs! This complements the existing DiffusionPipeline class, providing a more flexible way to create custom diffusion pipelines.

Find more details on how to get started with Modular Diffusers here, and also check out the announcement post.

New Pipelines and Models

Image 🌆

Z Image Omni Base: Z-Image is the foundation model of the Z-Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence. While Z-Image-Turbo is built for speed, Z-Image is a full-capacity, undistilled transformer designed to be the backbone for creators, researchers, and developers who require the highest level of creative freedom. Thanks to @RuoyiDufor for contributing this in #12857.

Flux2 Klein:FLUX.2 [Klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second. Built for applications that require real-time image generation without sacrificing quality, and runs on consumer hardware, with as little as 13GB VRAM.

Qwen Image Layered: Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. This layered representation unlocks inherent editability: each layer can be independently manipulated without affecting other content. Thanks to @naykun for contributing this in #12853.

FIBO Edit: Fibo Edit is an 8B parameter image-to-image model that introduces a new paradigm of structured control, operating on JSON inputs paired with source images to enable deterministic and repeatable editing workflows. Featuring native masking for granular precision, it moves beyond simple prompt-based diffusion to offer explicit, interpretable control optimized for production environments. Its lightweight architecture is designed for deep customization, empowering researchers to build specialized “Edit” models for domain-specific tasks while delivering top-tier aesthetic quality. Thanks galbria for contributing it in #12930.

Cosmos Predict2.5: Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world. Thanks to @miguelmartin75 for contributing it in #12852.

Cosmos Transfer2.5: Cosmos-Transfer2.5 is a conditional world generation model with adaptive multimodal control, that produces high-quality world simulations conditioned on multiple control inputs. These inputs can take different modalities—including edges, blurred video, segmentation maps, and depth maps. Thanks to @miguelmartin75 for contributing it in #13066.

GLM-Image: GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture, effectively pushing the upper bound of visual fidelity and fine-grained details. In general image generation quality, it aligns with industry-standard LDM-based approaches, while demonstrating significant advantages in knowledge-intensive image generation scenarios. Thanks to @zRzRzRzRzRzRzR for contributing it in #12973.

RAE: Representation Autoencoders (aka RAE) are an exciting alternative to traditional VAEs, typically used in the area of latent-space diffusion models of image generation. RAEs leverage pre-trained vision encoders and train lightweight decoders for the task of reconstruction.

Video + audio 🎥 🎼

LTX-2: LTX-2 is an audio-conditioned text-to-video generation model that can generate videos with synced audio. Full and distilled model inference, as well as two-stage inference with spatial sampling, is supported. We also support a conditioning pipeline that allows for passing different conditions (such as images, series of images, etc.). Check out the docs to learn more!

Helios: Helios is a 14B video generation model that runs at 17 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching a strong baseline in quality. Thanks to @SHYuanBest for contributing this in #13208.

Improvements to Core Library

New caching methods

MagCache — thanks to @AlanPonnachan!

TaylorSeer — thanks to @toilaluan!

New context-parallelism (CP) backends

Unified Sequence Parallel attention — thanks to @Bissmella!

Ulysses Anything Attention — thanks to @DefTruth!

Misc

Mambo-G Guidance: New guider implementation (#12862)

Laplace Scheduler for DDPM (#11320)

Custom Sigmas in UniPCMultistepScheduler (#12109)

MultiControlNet support for SD3 Inpainting (#11251)

Context parallel in native flash attention (#12829)

NPU Ulysses Attention Support (#12919)

Fix Wan 2.1 I2V Context Parallel Inference (#12909)

Fix Qwen-Image Context Parallel Inference (#12970)

Introduction to @apply_lora_scale decorator for simplifying model definitions (#12994)

Introduction of pipeline-level “cpu” device_map (#12811)

Enable CP for kernels-based attention backends (#12812)

Diffusers is fully functional with Transformers V5 (#12976)

A lot of the above features/improvements came as part of the MVP program we have been running. Immense thanks to the contributors!

Bug Fixes

Fix QwenImageEditPlus on NPU (#13017)

Fix MT5Tokenizer → use T5Tokenizer for Transformers v5.0+ compatibility (#12877)

Fix Wan/WanI2V patchification (#13038)

Fix LTX-2 inference with num_videos_per_prompt > 1 and CFG (#13121)

Fix Flux2 img2img prediction (#12855)

Fix QwenImage txt_seq_lens handling (#12702)

Fix prefix_token_len bug (#12845)

Fix ftfy imports in Wan and SkyReels-V2 (#12314, #13113)

Fix is_fsdp determination (#12960)

Fix GLM-Image get_image_features API (#13052)

Fix Wan 2.2 when either transformer isn't present (#13055)

Fix guider issue (#13147)

Fix torchao quantizer for new versions (#12901)

Fix GGUF for unquantized types with unquantize kernels (#12498)

Make Qwen hidden states contiguous for torchao (#13081)

Make Flux hidden states contiguous (#13068)

Fix Kandinsky 5 hardcoded CUDA autocast (#12814)

Fix aiter availability check (#13059)

Fix attention mask check for unsupported backends (#12892)

Allow prompt and prior_token_ids simultaneously in GlmImagePipeline (#13092)

GLM-Image batch support (#13007)

Cosmos 2.5 Video2World frame extraction fix (#13018)

ResNet: only use contiguous in training mode (#12977)

All commits

[PRX] Improve model compilation by @WaterKnight1998 in #12787

Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py by @delmalih in #12798

[Modular]z-image by @yiyixuxu in #12808

Fix Qwen Edit Plus modular for multi-image input by @sayakpaul in #12601

[WIP] Add Flux2 modular by @DN6 in #12763

[docs] improve distributed inference cp docs. by @sayakpaul in #12810

post release 0.36.0 by @sayakpaul in #12804

Update distributed_inference.md to correct syntax by @sayakpaul in #12827

[lora] Remove lora docs unneeded and add " # Copied from ..." by @sayakpaul in #12824

support CP in native flash attention by @sywangyi in #12829

[qwen-image] edit 2511 support by @naykun in #12839

fix pytest tests/pipelines/pixart_sigma/test_pixart.py::PixArtSigmaPi… by @sywangyi in #12842

Support for control-lora by @lavinal712 in #10686

Add support for LongCat-Image by @junqiangwu in #12828

fix the prefix_token_len bug by @junqiangwu in #12845

extend TorchAoTest::test_model_memory_usage to other platform by @sywangyi in #12768

Qwen Image Layered Support by @naykun in #12853

Z-Image-Turbo ControlNet by @hlky in #12792

Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion by @miguelmartin75 in #12852

more update in modular by @yiyixuxu in #12560

Feature: Add Mambo-G Guidance as Guider by @MatrixTeam-AI in #12862

Add OvisImagePipeline in AUTO_TEXT2IMAGE_PIPELINES_MAPPING by @alvarobartt in #12876

Cosmos Predict2.5 14b Conversion by @miguelmartin75 in #12863

Use T5Tokenizer instead of MT5Tokenizer (removed in Transformers v5.0+) by @alvarobartt in #12877

Add z-image-omni-base implementation by @RuoyiDu in #12857

fix torchao quantizer for new torchao versions by @vkuzo in #12901

fix Qwen Image Transformer single file loading mapping function to be consistent with other loader APIs by @mbalabanski in #12894

Z-Image-Turbo from_single_file fix by @hlky in #12888

chore: fix dev version in setup.py by @DefTruth in #12904

Community Pipeline: Add z-image differential img2img by @r4inm4ker in #12882

Fix typo in src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py by @miguelmartin75 in #12914

Fix wan 2.1 i2v context parallel by @DefTruth in #12909

fix the use of device_map in CP docs by @sayakpaul in #12902

[core] remove unneeded autoencoder methods when subclassing from AutoencoderMixin by @sayakpaul in #12873

Detect 2.0 vs 2.1 ZImageControlNetModel by @hlky in #12861

Refactor environment variable assignments in workflow by @paulinebm in #12916

Add codeQL workflow by @paulinebm in #12917

Delete .github/workflows/codeql.yml by @paulinebm (direct commit on v0.37.0-release)

CodeQL workflow for security analysis by @paulinebm (direct commit on v0.37.0-release)

Check for attention mask in backends that don't support it by @dxqb in #12892

[Flux.1] improve pos embed for ascend npu by computing on npu by @zhangtao0408 in #12897

LTX Video 0.9.8 long multi prompt by @yaoqih in #12614

Add FSDP option for Flux2 by @leisuzz in #12860

Add transformer cache context for SkyReels-V2 pipelines & Update docs by @tolgacangoz in #12837

[docs] fix torchao typo. by @sayakpaul in #12883

Update wan.md to remove unneeded hfoptions by @sayakpaul in #12890

Improve docstrings and type hints in scheduling_edm_euler.py by @delmalih in #12871

[Modular] Video for Mellon by @asomoza in #12924

Add LTX 2.0 Video Pipelines by @dg845 in #12915

Add environment variables to checkout step by @paulinebm in #12927

Improve docstrings and type hints in scheduling_consistency_decoder.py by @delmalih in #12928

Fix: Remove hardcoded CUDA autocast in Kandinsky 5 to fix import warning by @adi776borate in #12814

Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #12865

fix the warning torch_dtype is deprecated by @msdsm in #12841

[NPU] npu attention enable ulysses by @TmacAaron in #12919

Torchao floatx version guard by @howardzhang-cv in #12923

Bugfix for dreambooth flux2 img2img2 by @leisuzz in #12825

[Modular] qwen refactor by @yiyixuxu in #12872

[modular] Tests for custom blocks in modular diffusers by @sayakpaul in #12557

[chore] remove controlnet implementations outside controlnet module. by @sayakpaul in #12152

[core] Handle progress bar and logging in distributed environments by @sayakpaul in #12806

Improve docstrings and type hints in scheduling_consistency_models.py by @delmalih in #12931

[Feature] MultiControlNet support for SD3Impainting by @ishan-modi in #11251

Laplace Scheduler for DDPM by @gapatron in #11320

Store vae.config.scaling_factor to prevent missing attr reference (sdxl advanced dreambooth training script) by @Teriks in #12346

Add thread-safe wrappers for components in pipeline (examples/server-async/utils/requestscopedpipeline.py) by @FredyRivera-dev in #12515

[Research] Latent Perceptual Loss (LPL) for Stable Diffusion XL by @kashif in #11573

Change timestep device to cpu for xla by @bhavya01 in #11501

[LoRA] add lora_alpha to sana README by @linoytsaban in #11780

Fix wrong param types, docs, and handles noise=None in scale_noise of FlowMatching schedulers by @Promisery in #11669

[docs] Remote inference by @stevhliu in #12372

Align HunyuanVideoConditionEmbedding with CombinedTimestepGuidanceTextProjEmbeddings by @samutamm in #12316

[Fix] syntax in QwenImageEditPlusPipeline by @SahilCarterr in #12371

Fix ftfy name error in Wan pipeline by @dsocek in #12314

[modular] error early in enable_auto_cpu_offload by @sayakpaul in #12578

[ChronoEdit] support multiple loras by @zhangjiewu in #12679

fix how is_fsdp is determined by @sayakpaul in #12960

[LoRA] add LoRA support to LTX-2 by @sayakpaul in #12933

Fix: typo in autoencoder_dc.py by @tvelovraf in #12687

[Modular] better docstring by @yiyixuxu in #12932

[docs] polish caching docs. by @sayakpaul in #12684

Fix typos by @omahs in #12705

Fix link to diffedit implementation reference by @JuanFKurucz in #12708

Fix QwenImage txt_seq_lens handling by @kashif in #12702

Bugfix for flux2 img2img2 prediction by @leisuzz in #12855

Add Flag to PeftLoraLoaderMixinTests to Enable/Disable Text Encoder LoRA Tests by @dg845 in #12962

Add Unified Sequence Parallel attention by @Bissmella in #12693

[Modular] Changes for using WAN I2V by @asomoza in #12959

Z rz rz rz rz rz rz r cogview by @sayakpaul in #12973

Update distributed_inference.md to reposition sections by @sayakpaul in #12971

[chore] make transformers version check stricter for glm image. by @sayakpaul in #12974

Remove 8bit device restriction by @SunMarc in #12972

disable_mmap in pipeline from_pretrained by @hlky in #12854

[Modular] mellon utils by @yiyixuxu in #12978

LongCat Image pipeline: Allow offloading/quantization of text_encoder component by @Yahweasel in #12963

Add ChromaInpaintPipeline by @hameerabbasi in #12848

fix Qwen-Image series context parallel by @DefTruth in #12970

Flux2 klein by @yiyixuxu in #12982

[modular] fix a bug in mellon param & improve docstrings by @yiyixuxu in #12980

add klein docs. by @sayakpaul in #12984

LTX 2 Single File Support by @dg845 in #12983

[core] gracefully error out when attn-backend x cp combo isn't supported. by @sayakpaul in #12832

Improve docstrings and type hints in scheduling_cosine_dpmsolver_multistep.py by @delmalih in #12936

[Docs] Replace root CONTRIBUTING.md with symlink to source docs by @delmalih in #12986

make style && make quality by @sayakpaul (direct commit on v0.37.0-release)

Revert "make style && make quality" by @sayakpaul (direct commit on v0.37.0-release)

[chore] make style to push new changes. by @sayakpaul in #12998

Fibo edit pipeline by @galbria in #12930

Fix variable name in docstring for PeftAdapterMixin.set_adapters by @geekuillaume in #13003

Improve docstrings and type hints in scheduling_ddim_cogvideox.py by @delmalih in #12992

[scheduler] Support custom sigmas in UniPCMultistepScheduler by @a-r-r-o-w in #12109

feat: accelerate longcat-image with regional compile by @lgyStoic in #13019

Improve docstrings and type hints in scheduling_ddim_flax.py by @delmalih in #13010

Improve docstrings and type hints in scheduling_ddim_inverse.py by @delmalih in #13020

fix Dockerfiles for cuda and xformers. by @sayakpaul in #13022

Resnet only use contiguous in training mode. by @jiqing-feng in #12977

feat: add qkv projection fuse for longcat transformers by @lgyStoic in #13021

Improve docstrings and type hints in scheduling_ddim_parallel.py by @delmalih in #13023

Improve docstrings and type hints in scheduling_ddpm_flax.py by @delmalih in #13024

Improve docstrings and type hints in scheduling_ddpm_parallel.py by @delmalih in #13027

Remove pooled_ mentions from Chroma inpaint by @hameerabbasi in #13026

Flag Flax schedulers as deprecated by @delmalih in #13031

[modular] add auto_docstring & more doc related refactors by @yiyixuxu in #12958

Upgrade GitHub Actions to latest versions by @salmanmkc in #12866

[From Single File] support from_single_file method for WanAnimateTransformer3DModel by @samadwar in #12691

Fix: Cosmos2.5 Video2World frame extraction and add default negative prompt by @adi776borate in #13018

[GLM-Image] Add batch support for GlmImagePipeline by @JaredforReal in #13007

[Qwen] avoid creating attention masks when there is no padding by @kashif in #12987

[modular]support klein by @yiyixuxu in #13002

[QwenImage] fix prompt isolation tests by @sayakpaul in #13042

fast tok update by @itazap in #13036

change to CUDA 12.9. by @sayakpaul in #13045

remove torchao autoquant from diffusers docs by @vkuzo in #13048

docs: improve docstring scheduling_dpm_cogvideox.py by @delmalih in #13044

Fix Wan/WanI2V patchification by @Jayce-Ping in #13038

LTX2 distilled checkpoint support by @rootonchair in #12934

[wan] fix layerwise upcasting tests on CPU by @sayakpaul in #13039

[ci] uniform run times and wheels for pytorch cuda. by @sayakpaul in #13047

docs: fix grammar in fp16_safetensors CLI warning by @Olexandr88 in #13040

[wan] fix wan 2.2 when either of the transformers isn't present. by @sayakpaul in #13055

[bug fix] GLM-Image fit new get_image_features API by @JaredforReal in #13052

Fix aiter availability check by @lauri9 in #13059

[Modular]add a real quick start guide by @yiyixuxu in #13029

feat: support Ulysses Anything Attention by @DefTruth in #12996

Refactor Model Tests by @DN6 in #12822

[Flux2] Fix LoRA loading for Flux2 Klein by adaptively enumerating transformer blocks by @songkey in #13030

[Modular] loader related by @yiyixuxu in #13025

[Modular] mellon doc etc by @yiyixuxu in #13051

[modular] change the template modular pipeline card by @sayakpaul in #13072

Add support for Magcache by @AlanPonnachan in #12744

[docs] Fix syntax error in quantization configuration by @sayakpaul in #13076

docs: improve docstring scheduling_dpmsolver_multistep_inverse.py by @delmalih in #13083

[core] make flux hidden states contiguous by @sayakpaul in #13068

[core] make qwen hidden states contiguous to make torchao happy. by @sayakpaul in #13081

Feature/zimage inpaint pipeline by @CalamitousFelicitousness in #13006

GGUF fix for unquantized types when using unquantize kernels by @dxqb in #12498

docs: improve docstring scheduling_dpmsolver_multistep_inverse.py by @delmalih in #13085

[modular]simplify components manager doc by @yiyixuxu in #13088

ZImageControlNet cfg by @hlky in #13080

[Modular] refactor Wan: modular pipelines by task etc by @yiyixuxu in #13063

[Modular] guard ModularPipeline.blocks attribute by @yiyixuxu in #13014

LTX 2 Improve encode_video by Accepting More Input Types by @dg845 in #13057

Z image lora training by @linoytsaban in #13056

[modular] add modular tests for Z-Image and Wan by @sayakpaul in #13078

[Docs] Add guide for AutoModel with custom code by @DN6 in #13099

[SkyReelsV2] Fix ftfy import by @asomoza in #13113

[lora] fix non-diffusers lora key handling for flux2 by @sayakpaul in #13119

[CI] Refactor Wan Model Tests by @DN6 in #13082

docs: improve docstring scheduling_edm_dpmsolver_multistep.py by @delmalih in #13122

[Fix]Allow prompt and prior_token_ids to be provided simultaneously in GlmImagePipeline by @JaredforReal in #13092

docs: improve docstring scheduling_flow_match_euler_discrete.py by @delmalih in #13127

Cosmos Transfer2.5 inference pipeline: general/{seg, depth, blur, edge} by @miguelmartin75 in #13066

[modular] add tests for robust model loading. by @sayakpaul in #13120

Fix LTX-2 Inference when num_videos_per_prompt > 1 and CFG is Enabled by @dg845 in #13121

[CI] Fix setuptools pkg_resources Errors by @dg845 in #13129

docs: improve docstring scheduling_flow_match_heun_discrete.py by @delmalih in #13130

[CI] Fix setuptools pkg_resources Bug for PR GPU Tests by @dg845 in #13132

fix cosmos transformer typing. by @sayakpaul in #13134

Sunset Python 3.8 & get rid of explicit typing exports where possible by @sayakpaul in #12524

feat: implement apply_lora_scale to remove boilerplate. by @sayakpaul in #12994

[docs] fix ltx2 i2v docstring. by @sayakpaul in #13135

[Modular] add different pipeine blocks to init by @yiyixuxu in #13145

fix MT5Tokenizer by @yiyixuxu in #13146

fix guider by @yiyixuxu in #13147

[Modular] update doc for ModularPipeline by @yiyixuxu in #13100

[Modular] add explicit workflow support by @yiyixuxu in #13028

[LTX2] Fix wrong lora mixin by @asomoza in #13144

[Pipelines] Remove k-diffusion by @DN6 in #13152

[tests] accept recompile_limit from the user in tests by @sayakpaul in #13150

[core] support device type device_maps to work with offloading. by @sayakpaul in #12811

[Bug] Fix QwenImageEditPlus Series on NPU by @zhangtao0408 in #13017

[CI] Add ftfy as a test dependency by @DN6 in #13155

docs: improve docstring scheduling_flow_match_lcm.py by @delmalih in #13160

[docs] add docs for qwenimagelayered by @stevhliu in #13158

Flux2: Tensor tuples can cause issues for checkpointing by @dxqb in #12777

[CI] Revert setuptools CI Fix as the Failing Pipelines are Deprecated by @dg845 in #13149

Fix ftfy import for PRX Pipeline by @dg845 in #13154

[core] Enable CP for kernels-based attention backends by @sayakpaul in #12812

remove deps related to test from ci by @sayakpaul in #13164

[CI] Fix new LoRAHotswap tests by @DN6 in #13163

[gguf][torch.compile time] Convert to plain tensor earlier in dequantize_gguf_tensor by @anijain2305 in #13166

Support Flux Klein peft (fal) lora format by @asomoza in #13169

Fix T5GemmaEncoder loading for transformers 5.x composite T5GemmaConfig by @DavidBert in #13143

Allow Automodel to use from_config with custom code. by @DN6 in #13123

Fix AutoModel typing Import Error by @dg845 in #13178

migrate to transformers v5 by @sayakpaul in #12976

fix: graceful fallback when attention backends fail to import by @sym-bot in #13060

[docs] Fix torchrun command argument order in docs by @sayakpaul in #13181

[attention backends] use dedicated wrappers from fa3 for cp. by @sayakpaul in #13165

Cosmos Transfer2.5 Auto-Regressive Inference Pipeline by @miguelmartin75 in #13114

Fix wrong do_classifier_free_guidance threshold in ZImagePipeline by @kirillsst in #13183

Fix Flash Attention 3 interface for new FA3 return format by @veeceey in #13173

Fix LTX-2 image-to-video generation failure in two stages generation by @Songrui625 in #13187

Fixing Kohya loras loading: Flux.1-dev loras with TE ("lora_te1_" prefix) by @christopher5106 in #13188

[Modular] update the auto pipeline blocks doc by @yiyixuxu in #13148

[tests] consistency tests for modular index by @sayakpaul in #13192

[modular] fallback to default_blocks_name when loading base block classes in ModularPipeline by @yiyixuxu in #13193

[chore] updates in the pypi publication workflow. by @sayakpaul in #12805

[tests] enable cpu offload test in torchao without compilation. by @sayakpaul in #12704

remove db utils from benchmarking by @sayakpaul in #13199

[AutoModel] Fix bug with subfolders and local model paths when loading custom code by @DN6 in #13197

[AutoModel] Allow registering auto_map to model config by @DN6 in #13186

[Modular] Save Modular Pipeline weights to Hub by @DN6 in #13168

docs: improve docstring scheduling_ipndm.py by @delmalih in #13198

Clean up accidental files by @DN6 in #13202

[modular]Update model card to include workflow by @yiyixuxu in #13195

[modular] not pass trust_remote_code to external repos by @yiyixuxu in #13204

[Modular] implement requirements validation for custom blocks by @sayakpaul in #12196

cogvideo example: Distribute VAE video encoding across processes in CogVideoX LoRA training by @jiqing-feng in #13207

Fix group-offloading bug by @SHYuanBest in #13211

Add Helios-14B Video Generation Pipelines by @dg845 in #13208

[Z-Image] Fix more do_classifier_free_guidance thresholds by @asomoza in #13212

[lora] fix zimage lora conversion to support for more lora. by @sayakpaul in #13209

adding lora support to z-image controlnet pipelines by @christopher5106 in #13200

Add LTX2 Condition Pipeline by @dg845 in #13058

Fix Helios paper link in documentation by @SHYuanBest in #13213

[attention backends] change to updated repo and version. by @sayakpaul in #13161

feat: implement rae autoencoder. by @Ando233 in #13046

Release: v0.37.0-release by @sayakpaul (direct commit on v0.37.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@delmalih

Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py (#12798)

Improve docstrings and type hints in scheduling_edm_euler.py (#12871)

Improve docstrings and type hints in scheduling_consistency_decoder.py (#12928)

Improve docstrings and type hints in scheduling_consistency_models.py (#12931)

Improve docstrings and type hints in scheduling_cosine_dpmsolver_multistep.py (#12936)

[Docs] Replace root CONTRIBUTING.md with symlink to source docs (#12986)

Improve docstrings and type hints in scheduling_ddim_cogvideox.py (#12992)

Improve docstrings and type hints in scheduling_ddim_flax.py (#13010)

Improve docstrings and type hints in scheduling_ddim_inverse.py (#13020)

Improve docstrings and type hints in scheduling_ddim_parallel.py (#13023)

Improve docstrings and type hints in scheduling_ddpm_flax.py (#13024)

Improve docstrings and type hints in scheduling_ddpm_parallel.py (#13027)

Flag Flax schedulers as deprecated (#13031)

docs: improve docstring scheduling_dpm_cogvideox.py (#13044)

docs: improve docstring scheduling_dpmsolver_multistep_inverse.py (#13083)

docs: improve docstring scheduling_dpmsolver_multistep_inverse.py (#13085)

docs: improve docstring scheduling_edm_dpmsolver_multistep.py (#13122)

docs: improve docstring scheduling_flow_match_euler_discrete.py (#13127)

docs: improve docstring scheduling_flow_match_heun_discrete.py (#13130)

docs: improve docstring scheduling_flow_match_lcm.py (#13160)

docs: improve docstring scheduling_ipndm.py (#13198)

@yiyixuxu

[Modular]z-image (#12808)

more update in modular (#12560)

[Modular] qwen refactor (#12872)

[Modular] better docstring (#12932)

[Modular] mellon utils (#12978)

Flux2 klein (#12982)

[modular] fix a bug in mellon param & improve docstrings (#12980)

[modular] add auto_docstring & more doc related refactors (#12958)

[modular]support klein (#13002)

[Modular]add a real quick start guide (#13029)

[Modular] loader related (#13025)

[Modular] mellon doc etc (#13051)

[modular]simplify components manager doc (#13088)

[Modular] refactor Wan: modular pipelines by task etc (#13063)

[Modular] guard ModularPipeline.blocks attribute (#13014)

[Modular] add different pipeine blocks to init (#13145)

fix MT5Tokenizer (#13146)

fix guider (#13147)

[Modular] update doc for ModularPipeline (#13100)

[Modular] add explicit workflow support (#13028)

[Modular] update the auto pipeline blocks doc (#13148)

[modular] fallback to default_blocks_name when loading base block classes in ModularPipeline (#13193)

[modular]Update model card to include workflow (#13195)

[modular] not pass trust_remote_code to external repos (#13204)

@sayakpaul

Fix Qwen Edit Plus modular for multi-image input (#12601)

[docs] improve distributed inference cp docs. (#12810)

post release 0.36.0 (#12804)

Update distributed_inference.md to correct syntax (#12827)

[lora] Remove lora docs unneeded and add " # Copied from ..." (#12824)

fix the use of device_map in CP docs (#12902)

[core] remove unneeded autoencoder methods when subclassing from AutoencoderMixin (#12873)

[docs] fix torchao typo. (#12883)

Update wan.md to remove unneeded hfoptions (#12890)

[modular] Tests for custom blocks in modular diffusers (#12557)

[chore] remove controlnet implementations outside controlnet module. (#12152)

[core] Handle progress bar and logging in distributed environments (#12806)

[modular] error early in enable_auto_cpu_offload (#12578)

fix how is_fsdp is determined (#12960)

[LoRA] add LoRA support to LTX-2 (#12933)

[docs] polish caching docs. (#12684)

Z rz rz rz rz rz rz r cogview (#12973)

Update distributed_inference.md to reposition sections (#12971)

[chore] make transformers version check stricter for glm image. (#12974)

add klein docs. (#12984)

[core] gracefully error out when attn-backend x cp combo isn't supported. (#12832)

make style && make quality

Revert "make style && make quality"

[chore] make style to push new changes. (#12998)

fix Dockerfiles for cuda and xformers. (#13022)

[QwenImage] fix prompt isolation tests (#13042)

change to CUDA 12.9. (#13045)

[wan] fix layerwise upcasting tests on CPU (#13039)

[ci] uniform run times and wheels for pytorch cuda. (#13047)

[wan] fix wan 2.2 when either of the transformers isn't present. (#13055)

[modular] change the template modular pipeline card (#13072)

[docs] Fix syntax error in quantization configuration (#13076)

[core] make flux hidden states contiguous (#13068)

[core] make qwen hidden states contiguous to make torchao happy. (#13081)

[modular] add modular tests for Z-Image and Wan (#13078)

[lora] fix non-diffusers lora key handling for flux2 (#13119)

[modular] add tests for robust model loading. (#13120)

fix cosmos transformer typing. (#13134)

Sunset Python 3.8 & get rid of explicit typing exports where possible (#12524)

feat: implement apply_lora_scale to remove boilerplate. (#12994)

[docs] fix ltx2 i2v docstring. (#13135)

[tests] accept recompile_limit from the user in tests (#13150)

[core] support device type device_maps to work with offloading. (#12811)

[core] Enable CP for kernels-based attention backends (#12812)

remove deps related to test from ci (#13164)

migrate to transformers v5 (#12976)

[docs] Fix torchrun command argument order in docs (#13181)

[attention backends] use dedicated wrappers from fa3 for cp. (#13165)

[tests] consistency tests for modular index (#13192)

[chore] updates in the pypi publication workflow. (#12805)

[tests] enable cpu offload test in torchao without compilation. (#12704)

remove db utils from benchmarking (#13199)

[Modular] implement requirements validation for custom blocks (#12196)

[lora] fix zimage lora conversion to support for more lora. (#13209)

[attention backends] change to updated repo and version. (#13161)

Release: v0.37.0-release

@DN6

[WIP] Add Flux2 modular (#12763)

Refactor Model Tests (#12822)

[Docs] Add guide for AutoModel with custom code (#13099)

[CI] Refactor Wan Model Tests (#13082)

[Pipelines] Remove k-diffusion (#13152)

[CI] Add ftfy as a test dependency (#13155)

[CI] Fix new LoRAHotswap tests (#13163)

Allow Automodel to use from_config with custom code. (#13123)

[AutoModel] Fix bug with subfolders and local model paths when loading custom code (#13197)

[AutoModel] Allow registering auto_map to model config (#13186)

[Modular] Save Modular Pipeline weights to Hub (#13168)

Clean up accidental files (#13202)

@naykun

[qwen-image] edit 2511 support (#12839)

Qwen Image Layered Support (#12853)

@junqiangwu

Add support for LongCat-Image (#12828)

fix the prefix_token_len bug (#12845)

@hlky

Z-Image-Turbo ControlNet (#12792)

Z-Image-Turbo from_single_file fix (#12888)

Detect 2.0 vs 2.1 ZImageControlNetModel (#12861)

disable_mmap in pipeline from_pretrained (#12854)

ZImageControlNet cfg (#13080)

@miguelmartin75

Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion (#12852)

Cosmos Predict2.5 14b Conversion (#12863)

Fix typo in src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py (#12914)

Cosmos Transfer2.5 inference pipeline: general/{seg, depth, blur, edge} (#13066)

Cosmos Transfer2.5 Auto-Regressive Inference Pipeline (#13114)

@RuoyiDu

Add z-image-omni-base implementation (#12857)

@r4inm4ker

Community Pipeline: Add z-image differential img2img (#12882)

@yaoqih

LTX Video 0.9.8 long multi prompt (#12614)

@dg845

Add LTX 2.0 Video Pipelines (#12915)

Add Flag to PeftLoraLoaderMixinTests to Enable/Disable Text Encoder LoRA Tests (#12962)

LTX 2 Single File Support (#12983)

LTX 2 Improve encode_video by Accepting More Input Types (#13057)

Fix LTX-2 Inference when num_videos_per_prompt > 1 and CFG is Enabled (#13121)

[CI] Fix setuptools pkg_resources Errors (#13129)

[CI] Fix setuptools pkg_resources Bug for PR GPU Tests (#13132)

[CI] Revert setuptools CI Fix as the Failing Pipelines are Deprecated (#13149)

Fix ftfy import for PRX Pipeline (#13154)

Fix AutoModel typing Import Error (#13178)

Add Helios-14B Video Generation Pipelines (#13208)

Add LTX2 Condition Pipeline (#13058)

@kashif

[Research] Latent Perceptual Loss (LPL) for Stable Diffusion XL (#11573)

Fix QwenImage txt_seq_lens handling (#12702)

[Qwen] avoid creating attention masks when there is no padding (#12987)

@bhavya01

Change timestep device to cpu for xla (#11501)

@linoytsaban

[LoRA] add lora_alpha to sana README (#11780)

Z image lora training (#13056)

@stevhliu

[docs] Remote inference (#12372)

[docs] add docs for qwenimagelayered (#13158)

@hameerabbasi

Add ChromaInpaintPipeline (#12848)

Remove pooled_ mentions from Chroma inpaint (#13026)

@galbria

Fibo edit pipeline (#12930)

@JaredforReal

[GLM-Image] Add batch support for GlmImagePipeline (#13007)

[bug fix] GLM-Image fit new get_image_features API (#13052)

[Fix]Allow prompt and prior_token_ids to be provided simultaneously in GlmImagePipeline (#13092)

@rootonchair

LTX2 distilled checkpoint support (#12934)

@AlanPonnachan

Add support for Magcache (#12744)

@CalamitousFelicitousness

Feature/zimage inpaint pipeline (#13006)

@Ando233

feat: implement rae autoencoder. (#13046)

Original source
Dec 9, 2025
- Date parsed from source:
  Dec 9, 2025
- First seen by Releasebot:
  Mar 20, 2026
diffusers by Hugging Face

Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more 🎄

diffusers releases a packed update with new image and video pipelines, TaylorSeer cache support, kernels-powered attention backends, and a new Flux.2 LoRA training script. It also expands model coverage with Z-Image, Kandinsky 5, HunyuanVideo 1.5, Wan Animate, ChronoEdit, and more.
The release features a number of new image and video pipelines, a new caching method, a new training script, new kernels - powered attention backends, and more. It is quite packed with a lot of new stuff, so make sure you read the release notes fully 🚀

New image pipelines

Flux2: Flux2 is the latest generation of image generation and editing model from Black Forest Labs. It’s capable of taking multiple input images as reference, making it versatile for different use cases.

Z-Image: Z-Image is a best-of-its-kind image generation model in the 6B param regime. Thanks to @JerryWu-code in #12703.

QwenImage Edit Plus: It’s an upgrade of QwenImage Edit and is capable of taking multiple input images as references. It can act as both a generation and an editing model. Thanks to @naykun for contributing in #12357.

Bria FIBO: FIBO is trained on structured JSON captions up to 1,000+ words and designed to understand and control different visual parameters such as lighting, composition, color, and camera settings, enabling precise and reproducible outputs. Thanks to @galbria for contributing this in #12545.

Kandinsky Image Lite: Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters). Thanks to @leffff for contributing this in #12664.

ChronoEdit: ChronoEdit reframes image editing as a video generation task, using input and edited images as start/end frames to leverage pretrained video models with temporal consistency. A temporal reasoning stage introduces reasoning tokens to ensure physically plausible edits and visualize the editing trajectory. Thanks to @zhangjiewu for contributing this in #12593.

New video pipelines

Sana-Video: Sana-Video is a fast and efficient video generation model, equipped to handle long video sequences, thanks to its incorporation of linear attention. Thanks to @lawrence-cj for contributing this in #12634.

Kandinsky 5: Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger models and offers the best understanding of Russian concepts in the open-source ecosystem. Thanks to @leffff for contributing this in #12478.

Hunyuan 1.5: HunyuanVideo-1.5 is a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs.

Wan Animate: Wan-Animate is a state-of-the-art character animation and replacement video model based on Wan2.1. Given a reference character image and driving motion video, it can either animate the character with motion from the driving video, or replace the existing character in that video with that character.

New kernels-powered attention backends

The kernels library helps you save a lot of time by providing pre-built kernel interfaces for various environments and accelerators. This release features three new kernels-powered attention backends:

Flash Attention 3 (+ its varlen variant)

Flash Attention 2 (+ its varlen variant)

SAGE

This means if any of the above backend is supported by your development environment, you should be able to skip the manual process of building the corresponding kernels and just use:

# Make sure you have `kernels` installed: `pip install kernels`. # You can choose `flash_hub` or `sage_hub`, too. pipe.transformer.set_attention_backend("_flash_3_hub")

For more details, check out the documentation.

TaylorSeer cache

TaylorSeer is now supported in Diffusers, delivering upto 3x speedups with negligible-to-none quality compromise. Thanks to @toilaluan for contributing this in #12648. Check out the documentation here.

New training script

Our Flux.2 integration features a LoRA fine-tuning script that you can check out here. We provide a number of optimizations to help make it run on consumer GPUs.

Misc

Reusing AttentionMixin: Making certain compatible models subclass from the AttentionMixin class helped us get rid of 2K LoC. Going forward, users can expect more such refactorings that will help make the library leaner and simpler. Check out #12463 for more details.

Diffusers backend in SGLang: sgl-project/sglang#14112.

We started the Diffusers MVP program to work with talented community members who will help us improve the library across multiple fronts. Check out the link for more information.

All commits

remove unneeded checkpoint imports. by @sayakpaul in #12488

[tests] fix clapconfig for text backbone in audioldm2 by @sayakpaul in #12490

ltx0.9.8 (without IC lora, autoregressive sampling) by @yiyixuxu in #12493

[docs] Attention checks by @stevhliu in #12486

[CI] Check links by @stevhliu in #12491

[ci] xfail more incorrect transformer imports. by @sayakpaul in #12455

[tests] introduce VAETesterMixin to consolidate tests for slicing and tiling by @sayakpaul in #12374

docs: cleanup of runway model by @EazyAl in #12503

Kandinsky 5 is finally in Diffusers! by @leffff in #12478

Remove Qwen Image Redundant RoPE Cache by @dg845 in #12452

Raise warning instead of error when imports are missing for custom code by @DN6 in #12513

Fix: Use incorrect temporary variable key when replacing adapter name… by @FeiXie8 in #12502

[docs] Organize toctree by modality by @stevhliu in #12514

styling issues. by @sayakpaul in #12522

Add Photon model and pipeline support by @DavidBert in #12456

purge HF_HUB_ENABLE_HF_TRANSFER; promote Xet by @Vaibhavs10 in #12497

Prx by @DavidBert in #12525

[core] AutoencoderMixin to abstract common methods by @sayakpaul in #12473

Kandinsky5 No cfg fix by @asomoza in #12527

Fix: Add _skip_keys for AutoencoderKLWan by @yiyixuxu in #12523

[CI] xfail the test_wuerstchen_prior test by @sayakpaul in #12530

[tests] Test attention backends by @sayakpaul in #12388

fix CI bug for kandinsky3_img2img case by @kaixuanliu in #12474

Fix MPS compatibility in get_1d_sincos_pos_embed_from_grid #12432 by @Aishwarya0811 in #12449

Handle deprecated transformer classes by @DN6 in #12517

fix constants.py to user upper() by @sayakpaul in #12479

HunyuanImage21 by @yiyixuxu in #12333

Loose the criteria tolerance appropriately for Intel XPU devices by @kaixuanliu in #12460

Deprecate Stable Cascade by @DN6 in #12537

[chore] Move guiders experimental warning by @sayakpaul in #12543

Fix Chroma attention padding order and update docs to use lodestones/Chroma1-HD by @josephrocca in #12508

Add AITER attention backend by @lauri9 in #12549

Fix small inconsistency in output dimension of "_get_t5_prompt_embeds" function in sd3 pipeline by @alirezafarashah in #12531

Kandinsky 5 10 sec (NABLA suport) by @leffff in #12520

Improve pos embed for Flux.1 inference on Ascend NPU by @gameofdimension in #12534

support latest few-step wan LoRA. by @sayakpaul in #12541

[Pipelines] Enable Wan VACE to run since single transformer by @DN6 in #12428

fix crash if tiling mode is enabled by @sywangyi in #12521

Fix typos in kandinsky5 docs by @Meatfucker in #12552

[ci] don't run sana layerwise casting tests in CI. by @sayakpaul in #12551

Bria fibo by @galbria in #12545

Avoiding graph break by changing the way we infer dtype in vae.decoder by @ppadjinTT in #12512

[Modular] Fix for custom block kwargs by @DN6 in #12561

[Modular] Allow custom blocks to be saved to local_dir by @DN6 in #12381

Fix Stable Diffusion 3.x pooled prompt embedding with multiple images by @friedrich in #12306

Fix custom code loading in Automodel by @DN6 in #12571

[modular] better warn message by @yiyixuxu in #12573

[tests] add tests for flux modular (t2i, i2i, kontext) by @sayakpaul in #12566

[modular]pass hub_kwargs to load_config by @yiyixuxu in #12577

ulysses enabling in native attention path by @sywangyi in #12563

Kandinsky 5.0 Docs fixes by @leffff in #12582

[docs] sort doc by @sayakpaul in #12586

[LoRA] add support for more Qwen LoRAs by @linoytsaban in #12581

[Modular] Allow ModularPipeline to load from revisions by @DN6 in #12592

Add optional precision-preserving preprocessing for examples/unconditional_image_generation/train_unconditional.py by @turian in #12596

[SANA-Video] Adding 5s pre-trained 480p SANA-Video inference by @lawrence-cj in #12584

Fix overflow and dtype handling in rgblike_to_depthmap (NumPy + PyTorch) by @MohammadSadeghSalehi in #12546

[Modular] Some clean up for Modular tests by @DN6 in #12579

feat: enable attention dispatch for huanyuan video by @DefTruth in #12591

fix the crash in Wan-AI/Wan2.2-TI2V-5B-Diffusers if CP is enabled by @sywangyi in #12562

[CI] Push test fix by @DN6 in #12617

add ChronoEdit by @zhangjiewu in #12593

[modular] wan! by @yiyixuxu in #12611

[CI] Fix typo in uv install by @DN6 in #12618

fix: correct import path for load_model_dict_into_meta in conversion scripts by @yashwantbezawada in #12616

Fix Context Parallel validation checks by @DN6 in #12446

[Modular] Clean up docs by @DN6 in #12604

Fix: update type hints for Tuple parameters across multiple files to support variable-length tuples by @cesaryuan in #12544

[CI] Remove unittest dependency from testing_utils.py by @DN6 in #12621

Fix rotary positional embedding dimension mismatch in Wan and SkyReels V2 transformers by @charchit7 in #12594

fix copies by @yiyixuxu in #12637

Add MLU Support. by @a120092009 in #12629

fix dispatch_attention_fn check by @yiyixuxu in #12636

[modular] add tests for qwen modular by @sayakpaul in #12585

ArXiv -> HF Papers by @qgallouedec in #12583

[docs] Update install instructions by @stevhliu in #12626

[modular] add a check by @yiyixuxu in #12628

Improve docstrings and type hints in scheduling_amused.py by @delmalih in #12623

[WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) by @dg845 in #12526

adjust unit tests for test_save_load_float16 by @kaixuanliu in #12500

skip autoencoderdl layerwise casting memory by @sayakpaul in #12647

[utils] Update check_doc_toc by @stevhliu in #12642

[docs] AutoModel by @stevhliu in #12644

Improve docstrings and type hints in scheduling_ddim.py by @delmalih in #12622

Improve docstrings and type hints in scheduling_ddpm.py by @delmalih in #12651

[Modular] Add Custom Blocks guide to doc by @DN6 in #12339

Improve docstrings and type hints in scheduling_euler_discrete.py by @delmalih in #12654

Update Wan Animate Docs by @dg845 in #12658

Rope in float32 for mps or npu compatibility by @DavidBert in #12665

[PRX pipeline]: add 1024 resolution ratio bins by @DavidBert in #12670

SANA-Video Image to Video pipeline SanaImageToVideoPipeline support by @lawrence-cj in #12634

[CI] Make CI logs less verbose by @DN6 in #12674

Revert AutoencoderKLWan's dim_mult default value back to list by @dg845 in #12640

[CI] Temporarily pin transformers by @DN6 in #12677

[core] Refactor hub attn kernels by @sayakpaul in #12475

[CI] Fix indentation issue in workflow files by @DN6 in #12685

[CI] Fix failing Pipeline CPU tests by @DN6 in #12681

Improve docstrings and type hints in scheduling_pndm.py by @delmalih in #12676

Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet by @pratim4dasude in #12649

Improve docstrings and type hints in scheduling_lms_discrete.py by @delmalih in #12678

Add FluxLoraLoaderMixin to Fibo pipeline by @SwayStar123 in #12688

bugfix: fix chrono-edit context parallel by @DefTruth in #12660

[core] support sage attention + FA2 through kernels by @sayakpaul in #12439

[i8n-pt] Fix grammar and expand Portuguese documentation by @cdutr in #12598

Fix variable naming typos in community FluxControlNetFillInpaintPipeline by @sqhuang in #12701

fix typo in docs by @lawrence-cj in #12675

Add Support for Z-Image Series by @JerryWu-code in #12703

let's go Flux2 🚀 by @sayakpaul in #12711

Update script names in README for Flux2 training by @anvilarth in #12713

[lora]: Fix Flux2 LoRA NaN test by @sayakpaul in #12714

[docs] Correct flux2 links by @sayakpaul in #12716

[docs] put autopipeline after overview and hunyuanimage in images by @sayakpaul in #12548

Improve docstrings and type hints in scheduling_dpmsolver_multistep.py by @delmalih in #12710

Support unittest for Z-image ⚡️ by @JerryWu-code in #12715

[chore] remove torch.save from remnant code. by @sayakpaul in #12717

Enable regional compilation on z-image transformer model by @sayakpaul in #12736

Fix examples not loading LoRA adapter weights from checkpoint by @SurAyush in #12690

[Modular] Add single file support to Modular by @DN6 in #12383

fix type-check for z-image transformer by @DefTruth in #12739

Hunyuanvideo15 by @yiyixuxu in #12696

[Docs] Update Imagen Video paper link in schedulers by @delmalih in #12724

Improve docstrings and type hints in scheduling_heun_discrete.py by @delmalih in #12726

Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py by @delmalih in #12766

fix FLUX.2 context parallel by @DefTruth in #12737

Rename BriaPipeline to BriaFiboPipeline in documentation by @galbria in #12758

Update bria_fibo.md with minor fixes by @sayakpaul in #12731

[feat]: implement "local" caption upsampling for Flux.2 by @sayakpaul in #12718

Add ZImage LoRA support and integrate into ZImagePipeline by @CalamitousFelicitousness in #12750

Add support for Ovis-Image by @DoctorKey in #12740

Fix TPU (torch_xla) compatibility Error about tensor repeat func along with empty dim. by @JerryWu-code in #12770

Fixes #12673. record_stream in group offloading is not working properly by @KimbingNg in #12721

[core] start varlen variants for attn backend kernels. by @sayakpaul in #12765

[core] reuse AttentionMixin for compatible classes by @sayakpaul in #12463

Deprecate upcast_vae in SDXL based pipelines by @DN6 in #12619

Kandinsky 5.0 Video Pro and Image Lite by @leffff in #12664

Fix: leaf_level offloading breaks after delete_adapters by @adi776borate in #12639

[tests] fix hunuyanvideo 1.5 offloading tests. by @sayakpaul in #12782

[Z-Image] various small changes, Z-Image transformer tests, etc. by @sayakpaul in #12741

Z-Image-Turbo from_single_file by @hlky in #12756

Update attention_backends.md to format kernels by @sayakpaul in #12757

Improve docstrings and type hints in scheduling_unipc_multistep.py by @delmalih in #12767

fix spatial compression ratio error for AutoEncoderKLWan doing tiled encode by @jerry2102 in #12753

[lora] support more ZImage LoRAs by @sayakpaul in #12790

PRX Set downscale_freq_shift to 0 for consistency with internal implementation by @DavidBert in #12791

Fix broken group offloading with block_level for models with standalone layers by @rycerzes in #12692

[Docs] Add Z-Image docs by @asomoza in #12775

move kandisnky docs. by @sayakpaul (direct commit on v0.36.0-release)

[docs] minor fixes to kandinsky docs by @sayakpaul in #12797

Improve docstrings and type hints in scheduling_deis_multistep.py by @delmalih in #12796

[Feat] TaylorSeer Cache by @toilaluan in #12648

Update the TensorRT-ModelOPT to Nvidia-ModelOPT by @jingyu-ml in #12793

add post init for safty checker by @jiqing-feng in #12794

[HunyuanVideo1.5] support step-distilled by @yiyixuxu in #12802

Add ZImageImg2ImgPipeline by @CalamitousFelicitousness in #12751

Release: v0.36.0-release by @sayakpaul (direct commit on v0.36.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@yiyixuxu

ltx0.9.8 (without IC lora, autoregressive sampling) (#12493)

Fix: Add _skip_keys for AutoencoderKLWan (#12523)

HunyuanImage21 (#12333)

[modular] better warn message (#12573)

[modular]pass hub_kwargs to load_config (#12577)

[modular] wan! (#12611)

fix copies (#12637)

fix dispatch_attention_fn check (#12636)

[modular] add a check (#12628)

Hunyuanvideo15 (#12696)

[HunyuanVideo1.5] support step-distilled (#12802)

@leffff

Kandinsky 5 is finally in Diffusers! (#12478)

Kandinsky 5 10 sec (NABLA suport) (#12520)

Kandinsky 5.0 Docs fixes (#12582)

Kandinsky 5.0 Video Pro and Image Lite (#12664)

@dg845

Remove Qwen Image Redundant RoPE Cache (#12452)

[WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) (#12526)

Update Wan Animate Docs (#12658)

Revert AutoencoderKLWan's dim_mult default value back to list (#12640)

@DN6

Raise warning instead of error when imports are missing for custom code (#12513)

Handle deprecated transformer classes (#12517)

Deprecate Stable Cascade (#12537)

[Pipelines] Enable Wan VACE to run since single transformer (#12428)

[Modular] Fix for custom block kwargs (#12561)

[Modular] Allow custom blocks to be saved to local_dir (#12381)

Fix custom code loading in Automodel (#12571)

[Modular] Allow ModularPipeline to load from revisions (#12592)

[Modular] Some clean up for Modular tests (#12579)

[CI] Push test fix (#12617)

[CI] Fix typo in uv install (#12618)

Fix Context Parallel validation checks (#12446)

[Modular] Clean up docs (#12604)

[CI] Remove unittest dependency from testing_utils.py (#12621)

[Modular] Add Custom Blocks guide to doc (#12339)

[CI] Make CI logs less verbose (#12674)

[CI] Temporarily pin transformers (#12677)

[CI] Fix indentation issue in workflow files (#12685)

[CI] Fix failing Pipeline CPU tests (#12681)

[Modular] Add single file support to Modular (#12383)

Deprecate upcast_vae in SDXL based pipelines (#12619)

@DavidBert

Add Photon model and pipeline support (#12456)

Prx (#12525)

Rope in float32 for mps or npu compatibility (#12665)

[PRX pipeline]: add 1024 resolution ratio bins (#12670)

PRX Set downscale_freq_shift to 0 for consistency with internal implementation (#12791)

@galbria

Bria fibo (#12545)

Rename BriaPipeline to BriaFiboPipeline in documentation (#12758)

@lawrence-cj

[SANA-Video] Adding 5s pre-trained 480p SANA-Video inference (#12584)

SANA-Video Image to Video pipeline SanaImageToVideoPipeline support (#12634)

fix typo in docs (#12675)

@zhangjiewu

add ChronoEdit (#12593)

@delmalih

Improve docstrings and type hints in scheduling_amused.py (#12623)

Improve docstrings and type hints in scheduling_ddim.py (#12622)

Improve docstrings and type hints in scheduling_ddpm.py (#12651)

Improve docstrings and type hints in scheduling_euler_discrete.py (#12654)

Improve docstrings and type hints in scheduling_pndm.py (#12676)

Improve docstrings and type hints in scheduling_lms_discrete.py (#12678)

Improve docstrings and type hints in scheduling_dpmsolver_multistep.py (#12710)

[Docs] Update Imagen Video paper link in schedulers (#12724)

Improve docstrings and type hints in scheduling_heun_discrete.py (#12726)

Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py (#12766)

Improve docstrings and type hints in scheduling_unipc_multistep.py (#12767)

Improve docstrings and type hints in scheduling_deis_multistep.py (#12796)

@pratim4dasude

Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet (#12649)

@JerryWu-code

Add Support for Z-Image Series (#12703)

Support unittest for Z-image ⚡️ (#12715)

Fix TPU (torch_xla) compatibility Error about tensor repeat func along with empty dim. (#12770)

@CalamitousFelicitousness

Add ZImage LoRA support and integrate into ZImagePipeline (#12750)

Add ZImageImg2ImgPipeline (#12751)

@DoctorKey

Add support for Ovis-Image (#12740)

Original source
Oct 15, 2025
- Date parsed from source:
  Oct 15, 2025
- First seen by Releasebot:
  Mar 20, 2026
diffusers by Hugging Face

🐞 fixes for `transformers` models, imports,

diffusers ships v0.35.2-patch with transformers offload fixes and PyTorch compatibility updates.
All commits

Release: v0.35.1-patch by @sayakpaul (direct commit on v0.35.2-patch)

handle offload_state_dict when initing transformers models by @sayakpaul in #12438

[CI] Fix TRANSFORMERS_FLAX_WEIGHTS_NAME import issue by @DN6 in #12354

Fix PyTorch 2.3.1 compatibility: add version guard for torch.library.… by @Aishwarya0811 in #12206

fix scale_shift_factor being on cpu for wan and ltx by @vladmandic in #12347

Release: v0.35.2-patch by @sayakpaul (direct commit on v0.35.2-patch)
Original source
Aug 20, 2025
- Date parsed from source:
  Aug 20, 2025
- First seen by Releasebot:
  Mar 20, 2026
diffusers by Hugging Face

v0.35.1 for improvements in Qwen-Image Edit

diffusers improves Qwen-Image Edit with two contributor PRs.
Thanks to @naykun for the following PRs that improve Qwen-Image Edit:

#12188

#12190

Original source
Aug 19, 2025
- Date parsed from source:
  Aug 19, 2025
- First seen by Releasebot:
  Mar 20, 2026
diffusers by Hugging Face

Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more

diffusers adds major new image, video, and editing pipelines, including Wan 2.2, Flux-Kontext, Qwen-Image, and Qwen-Image-Edit. It also brings new training scripts, faster pipeline loading, better GGUF support, and experimental Modular Diffusers for more flexible workflows.
This release comes packed with new image generation and editing pipelines, a new video pipeline, new training scripts, quality-of-life improvements, and much more. Read the rest of the release notes fully to not miss out on the fun stuff.

New pipelines 🧨

We welcomed new pipelines in this release:

Wan 2.2

Flux-Kontext

Qwen-Image

Qwen-Image-Edit

Wan 2.2 📹

This update to Wan provides significant improvements in video fidelity, prompt adherence, and style. Please check out the official doc to learn more.

Flux-Kontext 🎇

Flux-Kontext is a 12-billion-parameter rectified flow transformer capable of editing images based on text instructions. Please check out the official doc to learn more about it.

Qwen-Image 🌅

After a successful run of delivering language models and vision-language models, the Qwen team is back with an image generation model, which is Apache-2.0 licensed! It achieves significant advances in complex text rendering and precise image editing. To learn more about this powerful model, refer to our docs.

Thanks to @naykun for contributing both Qwen-Image and Qwen-Image-Edit via this PR and this PR.

New training scripts 🎛️

Make these newly added models your own with our training scripts:

Kontext trainer

Qwen-Image trainer

Single-file modeling implementations

Following the 🤗 Transformers’ philosophy of single-file modeling implementations, we have started implementing modeling code in single and self-contained files. The Flux Transformer code is one example of this.

Attention refactor

We have massively refactored how we do attention in the models. This allows us to provide support for different attention backends (such as PyTorch native scaled_dot_product_attention, Flash Attention 3, SAGE attention, etc.) in the library seamlessly.

Having attention supported this way also allows us to integrate different parallelization mechanisms, which we’re actively working on. Follow this PR if you’re interested.

Users shouldn’t be affected at all by these changes. Please open an issue if you face any problems.

Regional compilation

Regional compilation trims cold-start latency by only compiling the small and frequently-repeated block(s) of a model - typically a transformer layer - and enables reusing compiled artifacts for every subsequent occurrence. For many diffusion architectures, this delivers the same runtime speedups as full-graph compilation and reduces compile time by 8–10x. Refer to this doc to learn more.

Thanks to @anijain2305 for contributing this feature in this PR.

We have also authored a number of posts that center around the use of torch.compile. You can check them out at the links below:

Presenting Flux Fast: Making Flux go brrr on H100s

torch.compile and Diffusers: A Hands-On Guide to Peak Performance

Fast LoRA inference for Flux with Diffusers and PEFT

Faster pipeline loading ⚡️

Users can now load pipelines directly on an accelerator device leading to significantly faster load times. This particularly becomes evident when loading large pipelines like Wan and Qwen-Image.

from diffusers import DiffusionPipeline import torch ckpt_id = "Qwen/Qwen-Image" pipe = DiffusionPipeline.from_pretrained( ckpt_id, torch_dtype=torch.bfloat16, device_map="cuda" )

You can speed up loading even more by enabling parallelized loading of state dict shards. This is particularly helpful when you’re working with large models like Wan and Qwen-Image, where the model state dicts are typically sharded across multiple files.

import os os.environ["HF_ENABLE_PARALLEL_LOADING"] = "yes" # rest of the loading code ....

Better GGUF integration

@Isotr0py contributed support for native GGUF CUDA kernels in this PR. This should provide an approximately 10% improvement in inference speed.

We have also worked on a tool for converting regular checkpoints to GGUF, letting the community easily share their GGUF checkpoints. Learn more here.

We now support loading of Diffusers format GGUF checkpoints.

You can learn more about all of this in our GGUF official docs.

Modular Diffusers (Experimental)

Modular Diffusers is a system for building diffusion pipelines pipelines with individual pipeline blocks. It is highly customisable, with blocks that can be mixed and matched to adapt to or create a pipeline for a specific workflow or multiple workflows.

The API is currently in active development and is being released as an experimental feature. Learn more in our docs.

All commits

[tests] skip instead of returning. by @sayakpaul in #11793

adjust to get CI test cases passed on XPU by @kaixuanliu in #11759

fix deprecation in lora after 0.34.0 release by @sayakpaul in #11802

[chore] post release v0.34.0 by @sayakpaul in #11800

Follow up for Group Offload to Disk by @DN6 in #11760

[rfc][compile] compile method for DiffusionPipeline by @anijain2305 in #11705

[tests] add a test on torch compile for varied resolutions by @sayakpaul in #11776

adjust tolerance criteria for test_float16_inference in unit test by @kaixuanliu in #11809

Flux Kontext by @a-r-r-o-w in #11812

Kontext training by @sayakpaul in #11813

Kontext fixes by @a-r-r-o-w in #11815

remove syncs before denoising in Kontext by @sayakpaul in #11818

[CI] disable onnx, mps, flax from the CI by @sayakpaul in #11803

TorchAO compile + offloading tests by @a-r-r-o-w in #11697

Support dynamically loading/unloading loras with group offloading by @a-r-r-o-w in #11804

[lora] fix: lora unloading behvaiour by @sayakpaul in #11822

[lora]feat: use exclude modules to loraconfig. by @sayakpaul in #11806

ENH: Improve speed of function expanding LoRA scales by @BenjaminBossan in #11834

Remove print statement in SCM Scheduler by @a-r-r-o-w in #11836

[tests] add test for hotswapping + compilation on resolution changes by @sayakpaul in #11825

reset deterministic in tearDownClass by @jiqing-feng in #11785

[tests] Fix failing float16 cuda tests by @a-r-r-o-w in #11835

[single file] Cosmos by @a-r-r-o-w in #11801

[docs] fix single_file example. by @sayakpaul in #11847

Use real-valued instead of complex tensors in Wan2.1 RoPE by @mjkvaak-amd in #11649

[docs] Batch generation by @stevhliu in #11841

[docs] Deprecated pipelines by @stevhliu in #11838

fix norm not training in train_control_lora_flux.py by @Luo-Yihang in #11832

[From Single File] support from_single_file method for WanVACE3DTransformer by @J4BEZ in #11807

[lora] tests for exclude_modules with Wan VACE by @sayakpaul in #11843

update: FluxKontextInpaintPipeline support by @vuongminh1907 in #11820

[Flux Kontext] Support Fal Kontext LoRA by @linoytsaban in #11823

[docs] Add a note of _keep_in_fp32_modules by @a-r-r-o-w in #11851

[benchmarks] overhaul benchmarks by @sayakpaul in #11565

FIX set_lora_device when target layers differ by @BenjaminBossan in #11844

Fix Wan AccVideo/CausVid fuse_lora by @a-r-r-o-w in #11856

[chore] deprecate blip controlnet pipeline. by @sayakpaul in #11877

[docs] fix references in flux pipelines. by @sayakpaul in #11857

[tests] remove tests for deprecated pipelines. by @sayakpaul in #11879

[docs] LoRA metadata by @stevhliu in #11848

[training ] add Kontext i2i training by @sayakpaul in #11858

[CI] Fix big GPU test marker by @DN6 in #11786

First Block Cache by @a-r-r-o-w in #11180

[tests] annotate compilation test classes with bnb by @sayakpaul in #11715

Update chroma.md by @shm4r7 in #11891

[CI] Speed up GPU PR Tests by @DN6 in #11887

Pin k-diffusion for CI by @sayakpaul in #11894

[Docker] update doc builder dockerfile to include quant libs. by @sayakpaul in #11728

[tests] Remove more deprecated tests by @sayakpaul in #11895

[tests] mark the wanvace lora tester flaky by @sayakpaul in #11883

[tests] add compile + offload tests for GGUF. by @sayakpaul in #11740

feat: add multiple input image support in Flux Kontext by @Net-Mist in #11880

Fix unique memory address when doing group-offloading with disk by @sayakpaul in #11767

[SD3] CFG Cutoff fix and official callback by @asomoza in #11890

The Modular Diffusers by @yiyixuxu in #9672

[quant] QoL improvements for pipeline-level quant config by @sayakpaul in #11876

Bump torch from 2.4.1 to 2.7.0 in /examples/server by @dependabot[bot] in #11429

[LoRA] fix: disabling hooks when loading loras. by @sayakpaul in #11896

[utils] account for MPS when available in get_device(). by @sayakpaul in #11905

[ControlnetUnion] Multiple Fixes by @asomoza in #11888

Avoid creating tensor in CosmosAttnProcessor2_0 by @chenxiao111222 in #11761)

[tests] Unify compilation + offloading tests in quantization by @sayakpaul in #11910

Speedup model loading by 4-5x ⚡ by @a-r-r-o-w in #11904

[docs] torch.compile blog post by @stevhliu in #11837

Flux: pass joint_attention_kwargs when using gradient_checkpointing by @piercus in #11814

Fix: Align VAE processing in ControlNet SD3 training with inference by @Henry-Bi in #11909

Bump aiohttp from 3.10.10 to 3.12.14 in /examples/server by @dependabot[bot] in #11924

[tests] Improve Flux tests by @a-r-r-o-w in #11919

Remove device synchronization when loading weights by @a-r-r-o-w in #11927

Remove forced float64 from onnx stable diffusion pipelines by @lostdisc in #11054

Fixed bug: Uncontrolled recursive calls that caused an infinite loop when loading certain pipelines containing Transformer2DModel by @lengmo1996 in #11923

[ControlnetUnion] Propagate #11888 to img2img by @asomoza in #11929

enable flux pipeline compatible with unipc and dpm-solver by @gameofdimension in #11908

[training] add an offload utility that can be used as a context manager. by @sayakpaul in #11775

Add SkyReels V2: Infinite-Length Film Generative Model by @tolgacangoz in #11518

[refactor] Flux/Chroma single file implementation + Attention Dispatcher by @a-r-r-o-w in #11916

[docs] clarify the mapping between Transformer2DModel and finegrained variants. by @sayakpaul in #11947

[Modular] Updates for Custom Pipeline Blocks by @DN6 in #11940

[docs] Update toctree by @stevhliu in #11936

[docs] include bp link. by @sayakpaul in #11952

Fix kontext finetune issue when batch size >1 by @mymusise in #11921

[tests] Add test slices for Hunyuan Video by @a-r-r-o-w in #11954

[tests] Add test slices for Cosmos by @a-r-r-o-w in #11955

[tests] Add fast test slices for HiDream-Image by @a-r-r-o-w in #11953

[Modular] update the collection behavior by @yiyixuxu in #11963

fix "Expected all tensors to be on the same device, but found at least two devices" error by @yao-matrix in #11690

Remove logger warnings for attention backends and hard error during runtime instead by @a-r-r-o-w in #11967

[Examples] Uniform notations in train_flux_lora by @tomguluson92 in #10011

fix style by @yiyixuxu in #11975

[tests] Add test slices for Wan by @a-r-r-o-w in #11920

[docs] update guidance_scale docstring for guidance_distilled models. by @sayakpaul in #11935

[tests] enforce torch version in the compilation tests. by @sayakpaul in #11979

[modular diffusers] Wan by @a-r-r-o-w in #11913

[compile] logger statements create unnecessary guards during dynamo tracing by @a-r-r-o-w in #11987

enable quantcompile test on xpu by @yao-matrix in #11988

[WIP] Wan2.2 by @yiyixuxu in #12004

[refactor] some shared parts between hooks + docs by @a-r-r-o-w in #11968

[refactor] Wan single file implementation by @a-r-r-o-w in #11918

Fix huggingface-hub failing tests by @asomoza in #11994

feat: add flux kontext by @jlonge4 in #11985

[modular] add Modular flux for text-to-image by @sayakpaul in #11995

[docs] include lora fast post. by @sayakpaul in #11993

[docs] quant_kwargs by @stevhliu in #11712

[docs] Fix link by @stevhliu in #12018

[wan2.2] add 5b i2v by @yiyixuxu in #12006

wan2.2 i2v FirstBlockCache fix by @okaris in #12013

[core] support attention backends for LTX by @sayakpaul in #12021

[docs] Update index by @stevhliu in #12020

[Fix] huggingface-cli to hf missed files by @asomoza in #12008

[training-scripts] Make pytorch examples UV-compatible by @sayakpaul in #12000

[wan2.2] fix vae patches by @yiyixuxu in #12041

Allow SD pipeline to use newer schedulers, eg: FlowMatch by @ppbrown in #12015

[LoRA] support lightx2v lora in wan by @sayakpaul in #12040

Fix type of force_upcast to bool by @BerndDoser in #12046

Update autoencoder_kl_cosmos.py by @tanuj-rai in #12045

Qwen-Image by @naykun in #12055

[wan2.2] follow-up by @yiyixuxu in #12024

tests + minor refactor for QwenImage by @a-r-r-o-w in #12057

Cross attention module to Wan Attention by @samuelt0 in #12058

fix(qwen-image): update vae license by @naykun in #12063

CI fixing by @paulinebm in #12059

enable all gpus when running ci. by @sayakpaul in #12062

fix the rest for all GPUs in CI by @sayakpaul in #12064

[docs] Install by @stevhliu in #12026

[wip] feat: support lora in qwen image and training script by @sayakpaul in #12056

[docs] small corrections to the example in the Qwen docs by @sayakpaul in #12068

[tests] Fix Qwen test_inference slices by @a-r-r-o-w in #12070

[tests] deal with the failing AudioLDM2 tests by @sayakpaul in #12069

optimize QwenImagePipeline to reduce unnecessary CUDA synchronization by @chengzeyi in #12072

Add cuda kernel support for GGUF inference by @Isotr0py in #11869

fix input shape for WanGGUFTexttoVideoSingleFileTests by @jiqing-feng in #12081

[refactor] condense group offloading by @a-r-r-o-w in #11990

Fix group offloading synchronization bug for parameter-only GroupModule's by @a-r-r-o-w in #12077

Helper functions to return skip-layer compatible layers by @a-r-r-o-w in #12048

Make prompt_2 optional in Flux Pipelines by @DN6 in #12073

[tests] tighten compilation tests for quantization by @sayakpaul in #12002

Implement Frequency-Decoupled Guidance (FDG) as a Guider by @dg845 in #11976

fix flux type hint by @DefTruth in #12089

[qwen] device typo by @yiyixuxu in #12099

[lora] adapt new LoRA config injection method by @sayakpaul in #11999

lora_conversion_utils: replace lora up/down with a/b even if transformer. in key by @Beinsezii in #12101

[tests] device placement for non-denoiser components in group offloading LoRA tests by @sayakpaul in #12103

[Modular] Fast Tests by @yiyixuxu in #11937

[GGUF] feat: support loading diffusers format gguf checkpoints. by @sayakpaul in #11684

[docs] diffusers gguf checkpoints by @sayakpaul in #12092

[core] add modular support for Flux I2I by @sayakpaul in #12086

[lora] support loading loras from lightx2v/Qwen-Image-Lightning by @sayakpaul in #12119

[Modular] More Updates for Custom Code Loading by @DN6 in #11969

enable compilation in qwen image. by @sayakpaul in #12061

[tests] Add inference test slices for SD3 and remove unnecessary tests by @a-r-r-o-w in #12106

[chore] complete the licensing statement. by @sayakpaul in #12001

[docs] Cache link by @stevhliu in #12105

[Modular] Add experimental feature warning for Modular Diffusers by @DN6 in #12127

Add low_cpu_mem_usage option to from_single_file to align with from_pretrained by @IrisRainbowNeko in #12114

[docs] Modular diffusers by @stevhliu in #11931

[Bugfix] typo fix in NPU FA by @leisuzz in #12129

Add QwenImage Inpainting and Img2Img pipeline by @Trgtuan10 in #12117

[core] parallel loading of shards by @sayakpaul in #12028

try to use deepseek with an agent to auto i18n to zh by @SamYuan1990 in #12032

[docs] Refresh effective and efficient doc by @stevhliu in #12134

Fix bf15/fp16 for pipeline_wan_vace.py by @SlimRG in #12143

make parallel loading flag a part of constants. by @sayakpaul in #12137

[docs] Parallel loading of shards by @stevhliu in #12135

feat: cuda device_map for pipelines. by @sayakpaul in #12122

[core] respect local_files_only=True when using sharded checkpoints by @sayakpaul in #12005

support hf_quantizer in cache warmup. by @sayakpaul in #12043

make test_gguf all pass on xpu by @yao-matrix in #12158

[docs] Quickstart by @stevhliu in #12128

Qwen Image Edit Support by @naykun in #12164

remove silu for CogView4 by @lambertwjh in #12150

[qwen] Qwen image edit followups by @sayakpaul in #12166

Minor modification to support DC-AE-turbo by @chenjy2003 in #12169

[Docs] typo error in qwen image by @leisuzz in #12144

fix: caching allocator behaviour for quantization. by @sayakpaul in #12172

fix(training_utils): wrap device in list for DiffusionPipeline by @MengAiDev in #12178

[docs] Clarify guidance scale in Qwen pipelines by @sayakpaul in #12181

[LoRA] feat: support more Qwen LoRAs from the community. by @sayakpaul in #12170

Update README.md by @Taechai in #12182

[chore] add lora button to qwenimage docs by @sayakpaul in #12183

[Wan 2.2 LoRA] add support for 2nd transformer lora loading + wan 2.2 lightx2v lora by @linoytsaban in #12074

Release: v0.35.0 by @sayakpaul (direct commit on v0.35.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@vuongminh1907

update: FluxKontextInpaintPipeline support (#11820)

@Net-Mist

feat: add multiple input image support in Flux Kontext (#11880)

@tolgacangoz

Add SkyReels V2: Infinite-Length Film Generative Model (#11518)

@naykun

Qwen-Image (#12055)

fix(qwen-image): update vae license (#12063)

Qwen Image Edit Support (#12164)

@Trgtuan10

Add QwenImage Inpainting and Img2Img pipeline (#12117)

@SamYuan1990

try to use deepseek with an agent to auto i18n to zh (#12032)

Original source
Jul 1, 2025
- Date parsed from source:
  Jul 1, 2025
- First seen by Releasebot:
  Mar 20, 2026
diffusers by Hugging Face

Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more

diffusers adds major new video and image generation pipelines, including Wan VACE, Cosmos Predict2, LTX 0.9.7, Hunyuan Video Framepack, Chroma, and VisualCloze. It also improves torch.compile support, adds pipeline quantization and disk offloading, and expands LoRA and training support.
📹 New video generation pipelines

Wan VACE

Wan VACE supports various generation techniques which achieve controllable video generation. It comes in two variants: a 1.3B model for fast iteration & prototyping, and a 14B for high quality generation. Some of the capabilities include:

Control to Video (Depth, Pose, Sketch, Flow, Grayscale, Scribble, Layout, Boundary Box, etc.). Recommended library for preprocessing videos to obtain control videos: huggingface/controlnet_aux

Image/Video to Video (first frame, last frame, starting clip, ending clip, random clips)

Inpainting and Outpainting

Subject to Video (faces, object, characters, etc.)

Composition to Video (reference anything, animate anything, swap anything, expand anything, move anything, etc.)

The code snippets available in this pull request demonstrate some examples of how videos can be generated with controllability signals.

Check out the docs to learn more.

Cosmos Predict2 Video2World

Cosmos-Predict2 is a key branch of the Cosmos World Foundation Models (WFMs) ecosystem for Physical AI, specializing in future state prediction through advanced world modeling. It offers two powerful capabilities: text-to-image generation for creating high-quality images from text descriptions, and video-to-world generation for producing visual simulations from video inputs.

The Video2World model comes in a 2B and 14B variant. Check out the docs to learn more.

LTX 0.9.7 and Distilled

LTX 0.9.7 and its distilled variants are the latest in the family of models released by Lightricks.

Check out the docs to learn more.

Hunyuan Video Framepack and F1

Framepack is a novel method for enabling long video generation. There are two released variants of Hunyuan Video trained using this technique. Check out the docs to learn more.

FusionX

The FusionX family of models and LoRAs, built on top of Wan2.1-14B, should already be supported. To load the model, use from_single_file():

from diffusers import WanTransformer3DModel transformer = WanTransformer3DModel.from_single_file( "https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/blob/main/Wan14Bi2vFusioniX_fp16.safetensors", torch_dtype=torch.bfloat16 )

To load the LoRAs, use load_lora_weights():

pipe = DiffusionPipeline.from_pretrained( "Wan-AI/Wan2.1-T2V-14B-Diffusers", torch_dtype=torch.bfloat16 ).to("cuda") pipe.load_lora_weights( "vrgamedevgirl84/Wan14BT2VFusioniX", weight_name="FusionX_LoRa/Wan2.1_T2V_14B_FusionX_LoRA.safetensors" )

AccVideo and CausVid (only LoRAs)

AccVideo and CausVid are two novel distillation techniques that speed up the generation time of video diffusion models while preserving quality. Diffusers supports loading their extracted LoRAs with their respective models.

🌠 New image generation pipelines

Cosmos Predict2 Text2Image

Text-to-image models from the Cosmos-Predict2 release. The models comes in a 2B and 14B variant. Check out the docs to learn more.

Chroma

Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it. Checkout the docs to learn more

Thanks to @Ednaordinary for contributing it in this PR!

VisualCloze

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning is an innovative in-context learning framework based universal image generation framework that offers key capabilities:

Support for various in-domain tasks

Generalization to unseen tasks through in-context learning

Unify multiple tasks into one step and generate both target image and intermediate results

Support reverse-engineering conditions from target images

Check out the docs to learn more. Thanks to @lzyhha for contributing this in this PR!

Better torch.compile support

We have worked with the PyTorch team to improve how we provide torch.compile() compatibility throughout the library. More specifically, we now test the widely used models like Flux for any recompilation and graph break issues which can get in the way of fully realizing torch.compile() benefits. Refer to the following links to learn more:

#11085

#11430

Additionally, users can combine offloading with compilation to get a better speed-memory trade-off. Below is an example:

import torch from diffusers import DiffusionPipeline torch._dynamo.config.cache_size_limit = 10000 pipeline = DiffusionPipeline.from_pretrained( "black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16 ) pipline.enable_model_cpu_offload() # Compile. pipeline.transformer.compile() image = pipeline( prompt="An astronaut riding a horse on Mars", guidance_scale=0., height=768, width=1360, num_inference_steps=4, max_sequence_length=256, ).images[0] print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")

This is compatible with group offloading, too. Interested readers can check out the concerned PRs below:

#11605

#11670

You can substantially reduce memory requirements by combining quantization with offloading and then improving speed with torch.compile(). Below is an example:

from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig from diffusers import AutoModel, FluxPipeline from transformers import T5EncoderModel import torch torch._dynamo.config.recompile_limit = 1000 quant_kwargs = {"load_in_4bit": True, "bnb_4bit_compute_dtype": torch_dtype, "bnb_4bit_quant_type": "nf4"} text_encoder_2_quant_config = TransformersBitsAndBytesConfig(**quant_kwargs) dit_quant_config = DiffusersBitsAndBytesConfig(**quant_kwargs) ckpt_id = "black-forest-labs/FLUX.1-dev" text_encoder_2 = T5EncoderModel.from_pretrained( ckpt_id, subfolder="text_encoder_2", quantization_config=text_encoder_2_quant_config, torch_dtype=torch_dtype, ) transformer = AutoModel.from_pretrained( ckpt_id, subfolder="transformer", quantization_config=dit_quant_config, torch_dtype=torch_dtype, ) pipe = FluxPipeline.from_pretrained( ckpt_id, transformer=transformer, text_encoder_2=text_encoder_2, torch_dtype=torch_dtype, ) pipe.enable_model_cpu_offload() pipe.transformer.compile() image = pipeline( prompt="An astronaut riding a horse on Mars", guidance_scale=3.5, height=768, width=1360, num_inference_steps=28, max_sequence_length=512, ).images[0]

Starting from bitsandbytes==0.46.0 onwards, bnb-quantized models should be fully compatible with torch.compile() without graph-breaks. This means that when compiling a bnb-quantized model, users can do: model.compile(fullgraph=True). This can significantly improve speed while still providing memory benefits. The figure below provides a comparison with Flux.1-Dev. Refer to this benchmarking script to learn more.

Note that for 4bit bnb models, it’s currently needed to install PyTorch nightly if fullgraph=True is specified during compilation.

Huge shoutout to @anijain2305 and @StrongerXi from the PyTorch team for the incredible support.

PipelineQuantizationConfig

Users can now provide a quantization config while initializing a pipeline:

import torch from diffusers import DiffusionPipeline from diffusers.quantizers import PipelineQuantizationConfig pipeline_quant_config = PipelineQuantizationConfig( quant_backend="bitsandbytes_4bit", quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16}, components_to_quantize=["transformer", "text_encoder_2"], ) pipe = DiffusionPipeline.from_pretrained( "black-forest-labs/FLUX.1-dev", quantization_config=pipeline_quant_config, torch_dtype=torch.bfloat16, ).to("cuda") image = pipe("photo of a cute dog").images[0]

This reduces the barrier to entry for our users willing to use quantization without having to write too much code. Refer to the documentation to learn more about different configurations allowed through PipelineQuantizationConfig.

Group offloading with disk

In the previous release, we shipped “group offloading” which lets you offload blocks/nodes within a model, optimizing its memory consumption. It also lets you overlap this offloading with computation, providing a good speed-memory trade-off, especially in low VRAM environments.

However, you still need a considerable amount of system RAM to make offloading work effectively. So, low VRAM and low RAM environments would still not work.

Starting this release, users will additionally have the option to offload to disk instead of RAM, further lowering memory consumption. Set the offload_to_disk_path to enable this feature.

pipeline.transformer.enable_group_offload( onload_device="cuda", offload_device="cpu", offload_type="leaf_level", offload_to_disk_path="path/to/disk" )

Refer to these two tables to compare the speed and memory trade-offs.

LoRA metadata parsing

It is beneficial to include the LoraConfig in a LoRA state dict that was used to train the LoRA. In its absence, users were restricted to using the same LoRA alpha as the LoRA rank. We have modified the most popular training scripts to allow passing custom lora_alpha through the CLI. Refer to this thread for more updates. Refer to this comment for some extended clarifications.

New training scripts

We now have a capable training script for training robust timestep-distilled models through the SANA Sprint framework. Check out this resource for more details. Thanks to @scxue and @lawrence-cj for contributing it in this PR.

HiDream LoRA DreamBooth training script (docs). The script supports training with quantization. HiDream is an MIT-licensed model. So, make it yours with this training script.

Updates on educational materials on quantization

We have worked on a two-part series discussing the support of quantization in Diffusers. Check them out:

Exploring Quantization Backends in Diffusers

(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware

All commits

[LoRA] support musubi wan loras. by @sayakpaul in #11243

fix test_vanilla_funetuning failure on XPU and A100 by @yao-matrix in #11263

make test_stable_diffusion_inpaint_fp16 pass on XPU by @yao-matrix in #11264

make test_dict_tuple_outputs_equivalent pass on XPU by @yao-matrix in #11265

add onnxruntime-qnn & onnxruntime-cann by @xieofxie in #11269

make test_instant_style_multiple_masks pass on XPU by @yao-matrix in #11266

[BUG] Fix convert_vae_pt_to_diffusers bug by @lavinal712 in #11078

Fix LTX 0.9.5 single file by @hlky in #11271

[Tests] Cleanup lora tests utils by @sayakpaul in #11276

[CI] relax tolerance for unclip further by @sayakpaul in #11268

do not use DIFFUSERS_REQUEST_TIMEOUT for notification bot by @sayakpaul in #11273

Fix incorrect tile_latent_min_width calculation in AutoencoderKLMochi by @kuantuna in #11294

HiDream Image by @hlky in #11231

flow matching lcm scheduler by @quickjkee in #11170

Update autoencoderkl_allegro.md by @Forbu in #11303

Hidream refactoring follow ups by @a-r-r-o-w in #11299

Fix incorrect tile_latent_min_width calculations by @kuantuna in #11305

[ControlNet] Adds controlnet for SanaTransformer by @ishan-modi in #11040

make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU by @yao-matrix in #11308

make test_stable_diffusion_karras_sigmas pass on XPU by @yao-matrix in #11310

make KolorsPipelineFastTests::test_inference_batch_single_identical pass on XPU by @faaany in #11313

[LoRA] support more SDXL loras. by @sayakpaul in #11292

[HiDream] code example by @linoytsaban in #11317

import for FlowMatchLCMScheduler by @asomoza in #11318

Use float32 on mps or npu in transformer_hidream_image's rope by @hlky in #11316

Add skrample section to community_projects.md by @Beinsezii in #11319

[docs] Promote AutoModel usage by @sayakpaul in #11300

[LoRA] Add LoRA support to AuraFlow by @hameerabbasi in #10216

Fix vae.Decoder prev_output_channel by @hlky in #11280

fix CPU offloading related fail cases on XPU by @yao-matrix in #11288

[docs] fix hidream docstrings. by @sayakpaul in #11325

Rewrite AuraFlowPatchEmbed.pe_selection_index_based_on_dim to be torch.compile compatible by @AstraliteHeart in #11297

post release 0.33.0 by @sayakpaul in #11255

another fix for FlowMatchLCMScheduler forgotten import by @asomoza in #11330

Fix Hunyuan I2V for transformers>4.47.1 by @DN6 in #11293

unpin torch versions for onnx Dockerfile by @sayakpaul in #11290

[single file] enable telemetry for single file loading when using GGUF. by @sayakpaul in #11284

[docs] add a snippet for compilation in the auraflow docs. by @sayakpaul in #11327

Hunyuan I2V fast tests fix by @DN6 in #11341

[BUG] fixed _toctree.yml alphabetical ordering by @ishan-modi in #11277

Fix wrong dtype argument name as torch_dtype by @nPeppon in #11346

[chore] fix lora docs utils by @sayakpaul in #11338

[docs] add note about use_duck_shape in auraflow docs. by @sayakpaul in #11348

[LoRA] Propagate hotswap better by @sayakpaul in #11333

[Hi Dream] follow-up by @yiyixuxu in #11296

[bitsandbytes] improve dtype mismatch handling for bnb + lora. by @sayakpaul in #11270

Update controlnet_flux.py by @haofanwang in #11350

enable 2 test cases on XPU by @yao-matrix in #11332

[BNB] Fix test_moving_to_cpu_throws_warning by @SunMarc in #11356

support Wan-FLF2V by @yiyixuxu in #11353

Fix: StableDiffusionXLControlNetAdapterInpaintPipeline incorrectly inherited StableDiffusionLoraLoaderMixin by @Kazuki-Yoda in #11357

update output for Hidream transformer by @yiyixuxu in #11366

[Wan2.1-FLF2V] update conversion script by @yiyixuxu in #11365

[Flux LoRAs] fix lr scheduler bug in distributed scenarios by @linoytsaban in #11242

[train_dreambooth_lora_sdxl.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @kghamilton89 in #11240

fix issue that training flux controlnet was unstable and validation r… by @PromeAIpro in #11373

Fix Wan I2V prepare_latents dtype by @a-r-r-o-w in #11371

[BUG] fixes in kadinsky pipeline by @ishan-modi in #11080

Add Serialized Type Name kwarg in Model Output by @anzr299 in #10502

[cogview4][feat] Support attention mechanism with variable-length support and batch packing by @OleehyO in #11349

Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma by @josephrocca in #11120

[Refactor] Minor Improvement for import utils by @ishan-modi in #11161

Add stochastic sampling to FlowMatchEulerDiscreteScheduler by @apolinario in #11369

[LoRA] add LoRA support to HiDream and fine-tuning script by @linoytsaban in #11281

Update modeling imports by @a-r-r-o-w in #11129

[HiDream] move deprecation to 0.35.0 by @yiyixuxu in #11384

Update README_hidream.md by @AMEERAZAM08 in #11386

Fix group offloading with block_level and use_stream=True by @a-r-r-o-w in #11375

[train_dreambooth_flux] Add LANCZOS as the default interpolation mode for image resizing by @ishandutta0098 in #11395

[Feature] Added Xlab Controlnet support by @ishan-modi in #11249

Kolors additional pipelines, community contrib by @Teriks in #11372

[HiDream LoRA] optimizations + small updates by @linoytsaban in #11381

Fix Flux IP adapter argument in the pipeline example by @AeroDEmi in #11402

[BUG] fixed WAN docstring by @ishan-modi in #11226

Fix typos in strings and comments by @co63oc in #11407

[train_dreambooth_lora.py] Set LANCZOS as default interpolation mode for resizing by @merterbak in #11421

[tests] add tests to check for graph breaks, recompilation, cuda syncs in pipelines during torch.compile() by @sayakpaul in #11085

enable group_offload cases and quanto cases on XPU by @yao-matrix in #11405

enable test_layerwise_casting_memory cases on XPU by @yao-matrix in #11406

[tests] fix import. by @sayakpaul in #11434

[train_text_to_image] Better image interpolation in training scripts follow up by @tongyu0924 in #11426

[train_text_to_image_lora] Better image interpolation in training scripts follow up by @tongyu0924 in #11427

enable 28 GGUF test cases on XPU by @yao-matrix in #11404

[Hi-Dream LoRA] fix bug in validation by @linoytsaban in #11439

Fixing missing provider options argument by @urpetkov-amd in #11397

Set LANCZOS as the default interpolation for image resizing in ControlNet training by @YoulunPeng in #11449

Raise warning instead of error for block offloading with streams by @a-r-r-o-w in #11425

enable marigold_intrinsics cases on XPU by @yao-matrix in #11445

torch.compile fullgraph compatibility for Hunyuan Video by @a-r-r-o-w in #11457

enable consistency test cases on XPU, all passed by @yao-matrix in #11446

enable unidiffuser test cases on xpu by @yao-matrix in #11444

Add generic support for Intel Gaudi accelerator (hpu device) by @dsocek in #11328

Add StableDiffusion3InstructPix2PixPipeline by @xduzhangjiayu in #11378

make safe diffusion test cases pass on XPU and A100 by @yao-matrix in #11458

[test_models_transformer_hunyuan_video] help us test torch.compile() for impactful models by @tongyu0924 in #11431

Add LANCZOS as default interplotation mode. by @Va16hav07 in #11463

make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu by @yao-matrix in #11461

[WAN] fix recompilation issues by @sayakpaul in #11475

Fix typos in docs and comments by @co63oc in #11416

[tests] xfail recent pipeline tests for specific methods. by @sayakpaul in #11469

cache packages_distributions by @vladmandic in #11453

[docs] Memory optims by @stevhliu in #11385

[docs] Adapters by @stevhliu in #11331

[train_dreambooth_lora_sdxl_advanced] Add LANCZOS as the default interpolation mode for image resizing by @yuanjua in #11471

[train_dreambooth_lora_flux_advanced] Add LANCZOS as the default interpolation mode for image resizing by @ysurs in #11472

enable semantic diffusion and stable diffusion panorama cases on XPU by @yao-matrix in #11459

[Feature] Implement tiled VAE encoding/decoding for Wan model. by @c8ef in #11414

[train_text_to_image_sdxl]Add LANCZOS as default interpolation mode for image resizing by @ParagEkbote in #11455

[train_dreambooth_lora_sdxl] Add --image_interpolation_mode option for image resizing (default to lanczos) by @MinJu-Ha in #11490

[train_dreambooth_lora_lumina2] Add LANCZOS as the default interpolation mode for image resizing by @cjfghk5697 in #11491

[training] feat: enable quantization for hidream lora training. by @sayakpaul in #11494

Set LANCZOS as the default interpolation method for image resizing. by @yijun-lee in #11492

Update training script for txt to img sdxl with lora supp with new interpolation. by @RogerSinghChugh in #11496

Fix torchao docs typo for fp8 granular quantization by @a-r-r-o-w in #11473

Update setup.py to pin min version of peft by @sayakpaul in #11502

update dep table. by @sayakpaul in #11504

[LoRA] use removeprefix to preserve sanity. by @sayakpaul in #11493

Hunyuan Video Framepack by @a-r-r-o-w in #11428

enable lora cases on XPU by @yao-matrix in #11506

[lora_conversion] Enhance key handling for OneTrainer components in LORA conversion utility by @iamwavecut in #11441)

[docs] minor updates to bitsandbytes docs. by @sayakpaul in #11509

Cosmos by @a-r-r-o-w in #10660

clean up the Init for stable_diffusion by @yiyixuxu in #11500

fix audioldm by @sayakpaul (direct commit on v0.34.0-release)

Revert "fix audioldm" by @sayakpaul (direct commit on v0.34.0-release)

[LoRA] make lora alpha and dropout configurable by @linoytsaban in #11467

Add cross attention type for Sana-Sprint training in diffusers. by @scxue in #11514

Conditionally import torchvision in Cosmos transformer by @a-r-r-o-w in #11524

[tests] fix audioldm2 for transformers main. by @sayakpaul in #11522

feat: pipeline-level quantization config by @sayakpaul in #11130

[Tests] Enable more general testing for torch.compile() with LoRA hotswapping by @sayakpaul in #11322

[LoRA] support non-diffusers hidream loras by @sayakpaul in #11532

enable 7 cases on XPU by @yao-matrix in #11503

[LTXPipeline] Update latents dtype to match VAE dtype by @james-p-xu in #11533

enable dit integration cases on xpu by @yao-matrix in #11523

enable print_env on xpu by @yao-matrix in #11507

Change Framepack transformer layer initialization order by @a-r-r-o-w in #11535

[tests] add tests for framepack transformer model. by @sayakpaul in #11520

Hunyuan Video Framepack F1 by @a-r-r-o-w in #11534

enable several pipeline integration tests on XPU by @yao-matrix in #11526

[test_models_transformer_ltx.py] help us test torch.compile() for impactful models by @cjfghk5697 in #11512

Add VisualCloze by @lzyhha in #11377

Fix typo in train_diffusion_orpo_sdxl_lora_wds.py by @Meeex2 in #11541

fix: remove torch_dtype="auto" option from docstrings by @johannaSommer in #11513

[train_dreambooth.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @kghamilton89 in #11239

[LoRA] small change to support Hunyuan LoRA Loading for FramePack by @linoytsaban in #11546

LTX Video 0.9.7 by @a-r-r-o-w in #11516

[tests] Enable testing for HiDream transformer by @sayakpaul in #11478

Update pipeline_flux_img2img.py to add missing vae_slicing and vae_tiling calls. by @Meatfucker in #11545

Fix deprecation warnings in test_ltx_image2video.py by @AChowdhury1211 in #11538

[tests] Add torch.compile test for UNet2DConditionModel by @olccihyeon in #11537

[Single File] GGUF/Single File Support for HiDream by @DN6 in #11550

[gguf] Refactor torch_function to avoid unnecessary computation by @anijain2305 in #11551

[tests] add tests for combining layerwise upcasting and groupoffloading. by @sayakpaul in #11558

[docs] Regional compilation docs by @sayakpaul in #11556

enhance value guard of _device_agnostic_dispatch by @yao-matrix in #11553

Doc update by @Player256 in #11531

Revert error to warning when loading LoRA from repo with multiple weights by @apolinario in #11568

[docs] tip for group offloding + quantization by @sayakpaul in #11576

[LoRA] support non-diffusers LTX-Video loras by @linoytsaban in #11572

[WIP][LoRA] start supporting kijai wan lora. by @sayakpaul in #11579

[Single File] Fix loading for LTX 0.9.7 transformer by @DN6 in #11578

Use HF Papers by @qgallouedec in #11567

LTX 0.9.7-distilled; documentation improvements by @a-r-r-o-w in #11571

[LoRA] kijai wan lora support for I2V by @linoytsaban in #11588

docs: fix invalid links by @osrm in #11505

[docs] Remove fast diffusion tutorial by @stevhliu in #11583

RegionalPrompting: Inherit from Stable Diffusion by @b-sai in #11525

[chore] allow string device to be passed to randn_tensor. by @sayakpaul in #11559

Type annotation fix by @DN6 in #11597

[LoRA] minor fix for load_lora_weights() for Flux and a test by @sayakpaul in #11595

Update Intel Gaudi doc by @regisss in #11479

enable pipeline test cases on xpu by @yao-matrix in #11527

[Feature] AutoModel can load components using model_index.json by @ishan-modi in #11401

[docs] Pipeline-level quantization by @stevhliu in #11604

Fix bug when variant and safetensor file does not match by @kaixuanliu in #11587

[tests] Changes to the torch.compile() CI and tests by @sayakpaul in #11508

Fix mixed variant downloading by @DN6 in #11611

fix security issue in build docker ci by @sayakpaul in #11614

Make group offloading compatible with torch.compile() by @sayakpaul in #11605

[training docs] smol update to README files by @linoytsaban in #11616

Adding NPU for get device function by @leisuzz in #11617

[LoRA] improve LoRA fusion tests by @sayakpaul in #11274

[Sana Sprint] add image-to-image pipeline by @linoytsaban in #11602

[CI] fix the filename for displaying failures in lora ci. by @sayakpaul in #11600

[docs] PyTorch 2.0 by @stevhliu in #11618

[textual_inversion_sdxl.py] fix lr scheduler steps count by @yuanjua in #11557

Fix wrong indent for examples of controlnet script by @Justin900429 in #11632

removing unnecessary else statement by @YanivDorGalron in #11624

enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed by @yao-matrix in #11620

Bug: Fixed Image 2 Image example by @vltmedia in #11619

typo fix in pipeline_flux.py by @YanivDorGalron in #11623

Fix typos in strings and comments by @co63oc in #11476

[docs] update torchao doc link by @sayakpaul in #11634

Use float32 RoPE freqs in Wan with MPS backends by @hvaara in #11643

[chore] misc changes in the bnb tests for consistency. by @sayakpaul in #11355

[tests] chore: rename lora model-level tests. by @sayakpaul in #11481

[docs] Caching methods by @stevhliu in #11625

[docs] Model cards by @stevhliu in #11112

[CI] Some improvements to Nightly reports summaries by @DN6 in #11166

[chore] bring PipelineQuantizationConfig at the top of the import chain. by @sayakpaul in #11656

[examples] flux-control: use num_training_steps_for_scheduler by @Markus-Pobitzer in #11662

use deterministic to get stable result by @jiqing-feng in #11663

[tests] add test for torch.compile + group offloading by @sayakpaul in #11670

Wan VACE by @a-r-r-o-w in #11582

fixed axes_dims_rope init (huggingface#11641) by @sofinvalery in #11678

[tests] Fix how compiler mixin classes are used by @sayakpaul in #11680

Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process by @DN6 in #11596

Add community class StableDiffusionXL_T5Pipeline by @ppbrown in #11626

Update pipeline_flux_inpaint.py to fix padding_mask_crop returning only the inpainted area by @Meatfucker in #11658

Allow remote code repo names to contain "." by @akasharidas in #11652

[LoRA] support Flux Control LoRA with bnb 8bit. by @sayakpaul in #11655

[Wan] Fix VAE sampling mode in WanVideoToVideoPipeline by @tolgacangoz in #11639

enable torchao test cases on XPU and switch to device agnostic APIs for test cases by @yao-matrix in #11654

[tests] tests for compilation + quantization (bnb) by @sayakpaul in #11672

[tests] model-level device_map clarifications by @sayakpaul in #11681

Improve Wan docstrings by @a-r-r-o-w in #11689

Set _torch_version to N/A if torch is disabled. by @rasmi in #11645

Avoid DtoH sync from access of nonzero() item in scheduler by @jbschlosser in #11696

Apply Occam's Razor in position embedding calculation by @tolgacangoz in #11562

[docs] add compilation bits to the bitsandbytes docs. by @sayakpaul in #11693

swap out token for style bot. by @sayakpaul in #11701

[docs] mention fp8 benefits on supported hardware. by @sayakpaul in #11699

Support Wan AccVideo lora by @a-r-r-o-w in #11704

[LoRA] parse metadata from LoRA and save metadata by @sayakpaul in #11324

Cosmos Predict2 by @a-r-r-o-w in #11695

Chroma Pipeline by @Ednaordinary in #11698

[LoRA ]fix flux lora loader when return_metadata is true for non-diffusers by @sayakpaul in #11716

[training] show how metadata stuff should be incorporated in training scripts. by @sayakpaul in #11707

Fix misleading comment by @carlthome in #11722

Add Pruna optimization framework documentation by @davidberenstein1957 in #11688

Support more Wan loras (VACE) by @a-r-r-o-w in #11726

[LoRA training] update metadata use for lora alpha + README by @linoytsaban in #11723

⚡️ Speed up method AutoencoderKLWan.clear_cache by 886% by @misrasaurabh1 in #11665

[training] add ds support to lora hidream by @leisuzz in #11737

[tests] device_map tests for all models. by @sayakpaul in #11708

[chore] change to 2025 licensing for remaining by @sayakpaul in #11741

Chroma Follow Up by @DN6 in #11725

[Quantizers] add is_compileable property to quantizers. by @sayakpaul in #11736

Update more licenses to 2025 by @a-r-r-o-w in #11746

Add missing HiDream license by @a-r-r-o-w in #11747

Bump urllib3 from 2.2.3 to 2.5.0 in /examples/server by @dependabot[bot] in #11748

[LoRA] refactor lora loading at the model-level by @sayakpaul in #11719

[CI] Fix WAN VACE tests by @DN6 in #11757

[CI] Fix SANA tests by @DN6 in #11756

Fix HiDream pipeline test module by @DN6 in #11754

make group offloading work with disk/nvme transfers by @sayakpaul in #11682

Update Chroma Docs by @DN6 in #11753

fix invalid component handling behaviour in PipelineQuantizationConfig by @sayakpaul in #11750

Fix failing cpu offload test for LTX Latent Upscale by @DN6 in #11755

[docs] Quantization + torch.compile + offloading by @stevhliu in #11703

[docs] device_map by @stevhliu in #11711

[docs] LoRA scale scheduling by @stevhliu in #11727

Fix dimensionalities in apply_rotary_emb functions' comments by @tolgacangoz in #11717

enable deterministic in bnb 4 bit tests by @jiqing-feng in #11738

enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU by @yao-matrix in #11671

[tests] properly skip tests instead of return by @sayakpaul in #11771

[CI] Skip ONNX Upscale tests by @DN6 in #11774

[Wan] Fix mask padding in Wan VACE pipeline. by @bennyguo in #11778

Add --lora_alpha and metadata handling to train_dreambooth_lora_sana.py by @imbr92 in #11744

[docs] minor cleanups in the lora docs. by @sayakpaul in #11770

[lora] only remove hooks that we add back by @yiyixuxu in #11768

[tests] Fix HunyuanVideo Framepack device tests by @a-r-r-o-w in #11789

[chore] raise as early as possible in group offloading by @sayakpaul in #11792

[tests] Fix group offloading and layerwise casting test interaction by @a-r-r-o-w in #11796

guard omnigen processor. by @sayakpaul in #11799

Release: v0.34.0 by @sayakpaul (direct commit on v0.34.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@yao-matrix

fix test_vanilla_funetuning failure on XPU and A100 (#11263)

make test_stable_diffusion_inpaint_fp16 pass on XPU (#11264)

make test_dict_tuple_outputs_equivalent pass on XPU (#11265)

make test_instant_style_multiple_masks pass on XPU (#11266)

make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU (#11308)

make test_stable_diffusion_karras_sigmas pass on XPU (#11310)

fix CPU offloading related fail cases on XPU (#11288)

enable 2 test cases on XPU (#11332)

enable group_offload cases and quanto cases on XPU (#11405)

enable test_layerwise_casting_memory cases on XPU (#11406)

enable 28 GGUF test cases on XPU (#11404)

enable marigold_intrinsics cases on XPU (#11445)

enable consistency test cases on XPU, all passed (#11446)

enable unidiffuser test cases on xpu (#11444)

make safe diffusion test cases pass on XPU and A100 (#11458)

make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu (#11461)

enable semantic diffusion and stable diffusion panorama cases on XPU (#11459)

enable lora cases on XPU (#11506)

enable 7 cases on XPU (#11503)

enable dit integration cases on xpu (#11523)

enable print_env on xpu (#11507)

enable several pipeline integration tests on XPU (#11526)

enhance value guard of _device_agnostic_dispatch (#11553)

enable pipeline test cases on xpu (#11527)

enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed (#11620)

enable torchao test cases on XPU and switch to device agnostic APIs for test cases (#11654)

enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU (#11671)

@hlky

Fix LTX 0.9.5 single file (#11271)

HiDream Image (#11231)

Use float32 on mps or npu in transformer_hidream_image's rope (#11316)

Fix vae.Decoder prev_output_channel (#11280)

@quickjkee

flow matching lcm scheduler (#11170)

@ishan-modi

[ControlNet] Adds controlnet for SanaTransformer (#11040)

[BUG] fixed _toctree.yml alphabetical ordering (#11277)

[BUG] fixes in kadinsky pipeline (#11080)

[Refactor] Minor Improvement for import utils (#11161)

[Feature] Added Xlab Controlnet support (#11249)

[BUG] fixed WAN docstring (#11226)

[Feature] AutoModel can load components using model_index.json (#11401)

@linoytsaban

[HiDream] code example (#11317)

[Flux LoRAs] fix lr scheduler bug in distributed scenarios (#11242)

[LoRA] add LoRA support to HiDream and fine-tuning script (#11281)

[HiDream LoRA] optimizations + small updates (#11381)

[Hi-Dream LoRA] fix bug in validation (#11439)

[LoRA] make lora alpha and dropout configurable (#11467)

[LoRA] small change to support Hunyuan LoRA Loading for FramePack (#11546)

[LoRA] support non-diffusers LTX-Video loras (#11572)

[LoRA] kijai wan lora support for I2V (#11588)

[training docs] smol update to README files (#11616)

[Sana Sprint] add image-to-image pipeline (#11602)

[LoRA training] update metadata use for lora alpha + README (#11723)

@hameerabbasi

[LoRA] Add LoRA support to AuraFlow (#10216)

@DN6

Fix Hunyuan I2V for transformers>4.47.1 (#11293)

Hunyuan I2V fast tests fix (#11341)

[Single File] GGUF/Single File Support for HiDream (#11550)

[Single File] Fix loading for LTX 0.9.7 transformer (#11578)

Type annotation fix (#11597)

Fix mixed variant downloading (#11611)

[CI] Some improvements to Nightly reports summaries (#11166)

Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process (#11596)

Chroma Follow Up (#11725)

[CI] Fix WAN VACE tests (#11757)

[CI] Fix SANA tests (#11756)

Fix HiDream pipeline test module (#11754)

Update Chroma Docs (#11753)

Fix failing cpu offload test for LTX Latent Upscale (#11755)

[CI] Skip ONNX Upscale tests (#11774)

@yiyixuxu

[Hi Dream] follow-up (#11296)

support Wan-FLF2V (#11353)

update output for Hidream transformer (#11366)

[Wan2.1-FLF2V] update conversion script (#11365)

[HiDream] move deprecation to 0.35.0 (#11384)

clean up the Init for stable_diffusion (#11500)

[lora] only remove hooks that we add back (#11768)

@Teriks

Kolors additional pipelines, community contrib (#11372)

@co63oc

Fix typos in strings and comments (#11407)

Fix typos in docs and comments (#11416)

Fix typos in strings and comments (#11476)

@xduzhangjiayu

Add StableDiffusion3InstructPix2PixPipeline (#11378)

@scxue

Add cross attention type for Sana-Sprint training in diffusers. (#11514)

@lzyhha

Add VisualCloze (#11377)

@b-sai

RegionalPrompting: Inherit from Stable Diffusion (#11525)

@Ednaordinary

Chroma Pipeline (#11698)

Original source
Apr 10, 2025
- Date parsed from source:
  Apr 10, 2025
- First seen by Releasebot:
  Mar 20, 2026
diffusers by Hugging Face

v0.33.1: fix ftfy import

diffusers fixes ftfy import for Wan pipelines.
All commits

fix ftfy import for wan pipelines by @yiyixuxu in #11262

Original source
Apr 9, 2025
- Date parsed from source:
  Apr 9, 2025
- First seen by Releasebot:
  Mar 20, 2026
diffusers by Hugging Face

Diffusers 0.33.0: New Image and Video Models, Memory Optimizations, Caching Methods, Remote VAEs, New Training Scripts, and more

diffusers releases a major update for video and image generation, adding Wan 2.1, LTX Video 0.9.5, Hunyuan Image to Video, Sana-Sprint, Lumina2, OmniGen and more. It also brings memory optimizations, cached inference, quantization, LoRA improvements and broader pipeline support.
New Pipelines for Video Generation

Wan 2.1

Wan2.1 is a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. The model release includes 4 different model variants and three different pipelines for Text to Video, Image to Video and Video to Video.

Wan-AI/Wan2.1-T2V-1.3B-Diffusers

Wan-AI/Wan2.1-T2V-14B-Diffusers

Wan-AI/Wan2.1-I2V-14B-480P-Diffusers

Wan-AI/Wan2.1-I2V-14B-720P-Diffusers

Check out the docs here to learn more.

LTX Video 0.9.5

LTX Video 0.9.5 is the updated version of the super-fast LTX Video model series. The latest model introduces additional conditioning options, such as keyframe-based animation and video extension (both forward and backward).

To support these additional conditioning inputs, we’ve introduced the LTXConditionPipeline and LTXVideoCondition object.

To learn more about the usage, check out the docs here.

Hunyuan Image to Video

Hunyuan utilizes a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only architecture as the text encoder. The input image is processed by the MLLM to generate semantic image tokens. These tokens are then concatenated with the video latent tokens, enabling comprehensive full-attention computation across the combined data and seamlessly integrating information from both the image and its associated caption.

To learn more, check out the docs here.

Others

EasyAnimateV5 (thanks to @bubbliiiing for contributing this in this PR)

ConsisID (thanks to @SHYuanBest for contributing this in this PR)

New Pipelines for Image Generation

Sana-Sprint

SANA-Sprint is an efficient diffusion model for ultra-fast text-to-image generation. SANA-Sprint is built on a pre-trained foundation model and augmented with hybrid distillation, dramatically reducing inference steps from 20 to 1-4, rivaling the quality of models like Flux.

Shoutout to @lawrence-cj for their help and guidance on this PR.

Check out the pipeline docs of SANA-Sprint to learn more.

Lumina2

Lumina-Image-2.0 is a 2B parameter flow-based diffusion transformer for text-to-image generation released under the Apache 2.0 license.

Check out the docs to learn more. Thanks to @zhuole1025 for contributing this through this PR.

One can also LoRA fine-tune Lumina2, taking advantage of its Apach2.0 licensing. Check out the guide for more details.

Omnigen

OmniGen is a unified image generation model that can handle multiple tasks including text-to-image, image editing, subject-driven generation, and various computer vision tasks within a single framework. The model consists of a VAE, and a single transformer based on Phi-3 that handles text and image encoding as well as the diffusion process.

Check out the docs to learn more about OmniGen. Thanks to @staoxiao for contributing OmniGen in this PR.

Others

CogView4 (thanks to @zRzRzRzRzRzRzR for contributing CogView4 in this PR)

New Memory Optimizations

Layerwise Casting

PyTorch supports torch.float8_e4m3fn and torch.float8_e5m2 as weight storage dtypes, but they can’t be used for computation on many devices due to unimplemented kernel support.

However, you can still use these dtypes to store model weights in FP8 precision and upcast them to a widely supported dtype such as torch.float16 or torch.bfloat16 on-the-fly when the layers are used in the forward pass. This is known as layerwise weight-casting. This can potentially cut down the VRAM requirements of a model by 50%.

Code

import torch from diffusers import CogVideoXPipeline, CogVideoXTransformer3DModel from diffusers.utils import export_to_video model_id = "THUDM/CogVideoX-5b" # Load the model in bfloat16 and enable layerwise casting transformer = CogVideoXTransformer3DModel.from_pretrained(model_id, subfolder="transformer", torch_dtype=torch.bfloat16) transformer.enable_layerwise_casting(storage_dtype=torch.float8_e4m3fn, compute_dtype=torch.bfloat16) # Load the pipeline pipe = CogVideoXPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.bfloat16) pipe.to("cuda") prompt = ( "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. " "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other " "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, " "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. " "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical " "atmosphere of this unique musical performance." ) video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0] export_to_video(video, "output.mp4", fps=8)

Group Offloading

Group offloading is the middle ground between sequential and model offloading. It works by offloading groups of internal layers (either torch.nn.ModuleList or torch.nn.Sequential), which uses less memory than model-level offloading. It is also faster than sequential-level offloading because the number of device synchronizations is reduced.

On CUDA devices, we also have the option to enable using layer prefetching with CUDA Streams. The next layer to be executed is loaded onto the accelerator device while the current layer is being executed which makes inference substantially faster while still keeping VRAM requirements very low. With this, we introduce the idea of overlapping computation with data transfer.

One thing to note is that using CUDA streams can cause a considerable spike in CPU RAM usage. Please ensure that the available CPU RAM is 2 times the size of the model if you choose to set use_stream=True. You can reduce CPU RAM usage by setting low_cpu_mem_usage=True. This should limit the CPU RAM used to be roughly the same as the size of the model, but will introduce slight latency in the inference process.

You can also use record_stream=True when using use_stream=True to obtain more speedups at the expense of slightly increased memory usage.

Code

import torch from diffusers import CogVideoXPipeline from diffusers.utils import export_to_video # Load the pipeline onload_device = torch.device("cuda") offload_device = torch.device("cpu") pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) # We can utilize the enable_group_offload method for Diffusers model implementations pipe.transformer.enable_group_offload( onload_device=onload_device, offload_device=offload_device, offload_type="leaf_level", use_stream=True ) prompt = ( "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. " "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other " "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, " "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. " "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical " "atmosphere of this unique musical performance." ) video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0] # This utilized about 14.79 GB. It can be further reduced by using tiling and using leaf_level offloading throughout the pipeline. print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB") export_to_video(video, "output.mp4", fps=8)

Group offloading can also be applied to non-Diffusers models such as text encoders from the transformers library.

Code

import torch from diffusers import CogVideoXPipeline from diffusers.hooks import apply_group_offloading from diffusers.utils import export_to_video # Load the pipeline onload_device = torch.device("cuda") offload_device = torch.device("cpu") pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) # For any other model implementations, the apply_group_offloading function can be used apply_group_offloading(pipe.text_encoder, onload_device=onload_device, offload_type="block_level", num_blocks_per_group=2)

Remote Components

Remote components are an experimental feature designed to offload memory-intensive steps of the inference pipeline to remote endpoints. The initial implementation focuses primarily on VAE decoding operations. Below are the currently supported model endpoints:

Model

Endpoint

Model

Stable Diffusion v1

https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud

stabilityai/sd-vae-ft-mse

Stable Diffusion XL

https://x2dmsqunjd6k9prw.us-east-1.aws.endpoints.huggingface.cloud

madebyollin/sdxl-vae-fp16-fix

Flux

https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud

black-forest-labs/FLUX.1-schnell

HunyuanVideo

https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud

hunyuanvideo-community/HunyuanVideo

This is an example of using remote decoding with the Hunyuan Video pipeline:

Code

from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel model_id = "hunyuanvideo-community/HunyuanVideo" transformer = HunyuanVideoTransformer3DModel.from_pretrained( model_id, subfolder="transformer", torch_dtype=torch.bfloat16 ) pipe = HunyuanVideoPipeline.from_pretrained( model_id, transformer=transformer, vae=None, torch_dtype=torch.float16 ).to("cuda") latent = pipe( prompt="A cat walks on the grass, realistic", height=320, width=512, num_frames=61, num_inference_steps=30, output_type="latent", ).frames video = remote_decode( endpoint="https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud/", tensor=latent, output_type="mp4", ) if isinstance(video, bytes): with open("video.mp4", "wb") as f: f.write(video)

Check out the docs to know more.

Introducing Cached Inference for DiTs

Cached Inference for Diffusion Transformer models is a performance optimization that significantly accelerates the denoising process by caching intermediate values. This technique reduces redundant computations across timesteps, resulting in faster generation with a slight dip in output quality.

Check out the docs to learn more about the available caching methods.

Pyramind Attention Broadcast

Code

import torch from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) pipe.to("cuda") config = PyramidAttentionBroadcastConfig( spatial_attention_block_skip_range=2, spatial_attention_timestep_skip_range=(100, 800), current_timestep_callback=lambda: pipe.current_timestep, ) pipe.transformer.enable_cache(config)

FasterCache

Code

import torch from diffusers import CogVideoXPipeline, FasterCacheConfig pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) pipe.to("cuda") config = FasterCacheConfig( spatial_attention_block_skip_range=2, spatial_attention_timestep_skip_range=(-1, 901), unconditional_batch_skip_range=2, attention_weight_callback=lambda _: 0.5, is_guidance_distilled=True, ) pipe.transformer.enable_cache(config)

Quantization

Quanto Backend

Diffusers now has support for the Quanto quantization backend, which provides float8 , int8 , int4 and int2 quantization dtypes.

import torch from diffusers import FluxTransformer2DModel, QuantoConfig model_id = "black-forest-labs/FLUX.1-dev" quantization_config = QuantoConfig(weights_dtype="float8") transformer = FluxTransformer2DModel.from_pretrained( model_id, subfolder="transformer", quantization_config=quantization_config, torch_dtype=torch.bfloat16, )

Quanto int8 models are also compatible with torch.compile :

import torch from diffusers import FluxTransformer2DModel, QuantoConfig model_id = "black-forest-labs/FLUX.1-dev" quantization_config = QuantoConfig(weights_dtype="float8") transformer = FluxTransformer2DModel.from_pretrained( model_id, subfolder="transformer", quantization_config=quantization_config, torch_dtype=torch.bfloat16, ) transformer.compile()

Improved loading for uintx TorchAO checkpoints with torch>=2.6

TorchAO checkpoints currently have to be serialized using pickle. For some quantization dtypes using the uintx format, such as uint4wo this involves saving subclassed TorchAO Tensor objects in the model file. This made loading the models directly with Diffusers a bit tricky since we do not allow deserializing artbitary Python objects from pickle files.

Torch 2.6 allows adding expected Tensors to torch safe globals, which lets us directly load TorchAO checkpoints with these objects.

state_dict = torch.load("/path/to/flux_uint4wo/diffusion_pytorch_model.bin", weights_only=False, map_location="cpu")

with init_empty_weights():

transformer = FluxTransformer2DModel.from_config("/path/to/flux_uint4wo/config.json")

transformer.load_state_dict(state_dict, strict=True, assign=True)

transformer = FluxTransformer2DModel.from_pretrained("/path/to/flux_uint4wo/")

LoRAs

We have shipped a couple of improvements on the LoRA front in this release.

🚨 Improved coverage for loading non-diffusers LoRA checkpoints for Flux

Take note of the breaking change introduced in this PR 🚨 We suggest you upgrade your peft installation to the latest version - pip install -U peft especially when dealing with Flux LoRAs.

torch.compile() support when hotswapping LoRAs without triggering recompilation

A common use case when serving multiple adapters is to load one adapter first, generate images, load another adapter, generate more images, load another adapter, etc. This workflow normally requires calling load_lora_weights(), set_adapters(), and possibly delete_adapters() to save memory. Moreover, if the model is compiled using torch.compile, performing these steps requires recompilation, which takes time.

To better support this common workflow, you can “hotswap” a LoRA adapter, to avoid accumulating memory and in some cases, recompilation. It requires an adapter to already be loaded, and the new adapter weights are swapped in-place for the existing adapter.

Check out the docs to learn more about this feature.

The other major change is the support for

Loading LoRAs into quantized model checkpoints

dtype Maps for Pipelines

Since various pipelines require their components to run in different compute dtypes, we now support passing a dtype map when initializing a pipeline:

from diffusers import HunyuanVideoPipeline import torch pipe = HunyuanVideoPipeline.from_pretrained( "hunyuanvideo-community/HunyuanVideo", torch_dtype={"transformer": torch.bfloat16, "default": torch.float16}, ) print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16)

AutoModel

This release includes an AutoModel object similar to the one found in transformers that automatically fetches the appropriate model class for the provided repo.

from diffusers import AutoModel unet = AutoModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet")

All commits

[Sana 4K] Add vae tiling option to avoid OOM by @leisuzz in #10583

IP-Adapter for StableDiffusion3Img2ImgPipeline by @guiyrt in #10589

[DC-AE, SANA] fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16 by @chenjy2003 in #10595

Move buffers to device by @hlky in #10523

[Docs] Update SD3 ip_adapter model_id to diffusers checkpoint by @guiyrt in #10597

Scheduling fixes on MPS by @hlky in #10549

[Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo by @chengzeyi in #10544

NPU adaption for RMSNorm by @leisuzz in #10534

implementing flux on TPUs with ptxla by @entrpn in #10515

[core] ConsisID by @SHYuanBest in #10140

[training] set rest of the blocks with requires_grad False. by @sayakpaul in #10607

chore: remove redundant words by @sunxunle in #10609

bugfix for npu not support float64 by @baymax591 in #10123

[chore] change licensing to 2025 from 2024. by @sayakpaul in #10615

Enable dreambooth lora finetune example on other devices by @jiqing-feng in #10602

Remove the FP32 Wrapper when evaluating by @lmxyy in #10617

[tests] make tests device-agnostic (part 3) by @faaany in #10437

fix offload gpu tests etc by @yiyixuxu in #10366

Remove cache migration script by @Wauplin in #10619

[core] Layerwise Upcasting by @a-r-r-o-w in #10347

Improve TorchAO error message by @a-r-r-o-w in #10627

[CI] Update HF_TOKEN in all workflows by @DN6 in #10613

add onnxruntime-migraphx as part of check for onnxruntime in import_utils.py by @kahmed10 in #10624

[Tests] modify the test slices for the failing flax test by @sayakpaul in #10630

[docs] fix image path in para attention docs by @sayakpaul in #10632

[docs] uv installation by @stevhliu in #10622

width and height are mixed-up by @raulc0399 in #10629

Add IP-Adapter example to Flux docs by @hlky in #10633

removing redundant requires_grad = False by @YanivDorGalron in #10628

[chore] add a script to extract loras from full fine-tuned models by @sayakpaul in #10631

Add pipeline_stable_diffusion_xl_attentive_eraser by @Anonym0u3 in #10579

NPU Adaption for Sanna by @leisuzz in #10409

Add sigmoid scheduler in scheduling_ddpm.py docs by @JacobHelwig in #10648

create a script to train autoencoderkl by @lavinal712 in #10605

Add community pipeline for semantic guidance for FLUX by @Marlon154 in #10610

ControlNet Union controlnet_conditioning_scale for multiple control inputs by @hlky in #10666

[training] Convert to ImageFolder script by @hlky in #10664

Add provider_options to OnnxRuntimeModel by @hlky in #10661

fix check_inputs func in LuminaText2ImgPipeline by @victolee0 in #10651

SDXL ControlNet Union pipelines, make control_image argument immutible by @Teriks in #10663

Revert RePaint scheduler 'fix' by @GiusCat in #10644

[core] Pyramid Attention Broadcast by @a-r-r-o-w in #9562

[fix] refer use_framewise_encoding on AutoencoderKLHunyuanVideo._encode by @hanchchch in #10600

Refactor gradient checkpointing by @a-r-r-o-w in #10611

[Tests] conditionally check fp8_e4m3_bf16_max_memory < fp8_e4m3_fp32_max_memory by @sayakpaul in #10669

Fix pipeline dtype unexpected change when using SDXL reference community pipelines in float16 mode by @dimitribarbot in #10670

[tests] update llamatokenizer in hunyuanvideo tests by @sayakpaul in #10681

support StableDiffusionAdapterPipeline.from_single_file by @Teriks in #10552

fix(hunyuan-video): typo in height and width input check by @badayvedat in #10684

[FIX] check_inputs function in Auraflow Pipeline by @SahilCarterr in #10678

Fix enable memory efficient attention on ROCm by @tenpercent in #10564

Fix inconsistent random transform in instruct pix2pix by @Luvata in #10698

feat(training-utils): support device and dtype params in compute_density_for_timestep_sampling by @badayvedat in #10699

Fixed grammar in "write_own_pipeline" readme by @N0-Flux-given in #10706

Fix Documentation about Image-to-Image Pipeline by @ParagEkbote in #10704

[bitsandbytes] Simplify bnb int8 dequant by @sayakpaul in #10401

Fix train_text_to_image.py --help by @nkthiebaut in #10711

Notebooks for Community Scripts-6 by @ParagEkbote in #10713

[Fix] Type Hint in from_pretrained() to Ensure Correct Type Inference by @SahilCarterr in #10714

add provider_options in from_pretrained by @xieofxie in #10719

[Community] Enhanced Model Search by @suzukimain in #10417

[bugfix] NPU Adaption for Sana by @leisuzz in #10724

Quantized Flux with IP-Adapter by @hlky in #10728

EDMEulerScheduler accept sigmas, add final_sigmas_type by @hlky in #10734

[LoRA] fix peft state dict parsing by @sayakpaul in #10532

Add Self type hint to ModelMixin's from_pretrained by @hlky in #10742

[Tests] Test layerwise casting with training by @sayakpaul in #10765

speedup hunyuan encoder causal mask generation by @dabeschte in #10764

[CI] Fix Truffle Hog failure by @DN6 in #10769

Add OmniGen by @staoxiao in #10148

feat: new community mixture_tiling_sdxl pipeline for SDXL by @elismasilva in #10759

Add support for lumina2 by @zhuole1025 in #10642

Refactor OmniGen by @a-r-r-o-w in #10771

Faster set_adapters by @Luvata in #10777

[Single File] Add Single File support for Lumina Image 2.0 Transformer by @DN6 in #10781

Fix use_lu_lambdas and use_karras_sigmas with beta_schedule=squaredcos_cap_v2 in DPMSolverMultistepScheduler by @hlky in #10740

MultiControlNetUnionModel on SDXL by @guiyrt in #10747

fix: [Community pipeline] Fix flattened elements on image by @elismasilva in #10774

make tensors contiguous before passing to safetensors by @faaany in #10761

Disable PEFT input autocast when using fp8 layerwise casting by @a-r-r-o-w in #10685

Update FlowMatch docstrings to mention correct output classes by @a-r-r-o-w in #10788

Refactor CogVideoX transformer forward by @a-r-r-o-w in #10789

Module Group Offloading by @a-r-r-o-w in #10503

Update Custom Diffusion Documentation for Multiple Concept Inference to resolve issue #10791 by @puhuk in #10792

[FIX] check_inputs function in lumina2 by @SahilCarterr in #10784

follow-up refactor on lumina2 by @yiyixuxu in #10776

CogView4 (supports different length c and uc) by @zRzRzRzRzRzRzR in #10649

typo fix by @YanivDorGalron in #10802

Extend Support for callback_on_step_end for AuraFlow and LuminaText2Img Pipelines by @ParagEkbote in #10746

[chore] update notes generation spaces by @sayakpaul in #10592

[LoRA] improve lora support for flux. by @sayakpaul in #10810

Fix max_shift value in flux and related functions to 1.15 (issue #10675) by @puhuk in #10807

[docs] add missing entries to the lora docs. by @sayakpaul in #10819

DiffusionPipeline mixin to+FromOriginalModelMixin/FromSingleFileMixin from_single_file type hint by @hlky in #10811

[LoRA] make set_adapters() robust on silent failures. by @sayakpaul in #9618

[FEAT] Model loading refactor by @SunMarc in #10604

[misc] feat: introduce a style bot. by @sayakpaul in #10274

Remove print statements by @a-r-r-o-w in #10836

[tests] use proper gemma class and config in lumina2 tests. by @sayakpaul in #10828

[LoRA] add LoRA support to Lumina2 and fine-tuning script by @sayakpaul in #10818

[Utils] add utilities for checking if certain utilities are properly documented by @sayakpaul in #7763

Add missing isinstance for arg checks in GGUFParameter by @AstraliteHeart in #10834

[tests] test encode_prompt() in isolation by @sayakpaul in #10438

store activation cls instead of function by @SunMarc in #10832

fix: support transformer models' generation_config in pipeline by @JeffersonQin in #10779

Notebooks for Community Scripts-7 by @ParagEkbote in #10846

[CI] install accelerate transformers from main by @sayakpaul in #10289

[CI] run fast gpu tests conditionally on pull requests. by @sayakpaul in #10310

SD3 IP-Adapter runtime checkpoint conversion by @guiyrt in #10718

Some consistency-related fixes for HunyuanVideo by @a-r-r-o-w in #10835

SkyReels Hunyuan T2V & I2V by @a-r-r-o-w in #10837

fix: run tests from a pr workflow. by @sayakpaul in #9696

[chore] template for remote vae. by @sayakpaul in #10849

fix remote vae template by @sayakpaul in #10852

[CI] Fix incorrectly named test module for Hunyuan DiT by @DN6 in #10854

[CI] Update always test Pipelines list in Pipeline fetcher by @DN6 in #10856

device_map in load_model_dict_into_meta by @hlky in #10851

[Fix] Docs overview.md by @SahilCarterr in #10858

remove format check for safetensors file by @SunMarc in #10864

[docs] LoRA support by @stevhliu in #10844

Comprehensive type checking for from_pretrained kwargs by @guiyrt in #10758

Fix torch_dtype in Kolors text encoder with transformers v4.49 by @hlky in #10816

[LoRA] restrict certain keys to be checked for peft config update. by @sayakpaul in #10808

Add SD3 ControlNet to AutoPipeline by @hlky in #10888

[docs] Update prompt weighting docs by @stevhliu in #10843

[docs] Flux group offload by @stevhliu in #10847

[Fix] fp16 unscaling in train_dreambooth_lora_sdxl by @SahilCarterr in #10889

[docs] Add CogVideoX Schedulers by @a-r-r-o-w in #10885

[chore] correct qk norm list. by @sayakpaul in #10876

[Docs] Fix toctree sorting by @DN6 in #10894

[refactor] SD3 docs & remove additional code by @a-r-r-o-w in #10882

[refactor] Remove additional Flux code by @a-r-r-o-w in #10881

[CI] Improvements to conditional GPU PR tests by @DN6 in #10859

Multi IP-Adapter for Flux pipelines by @guiyrt in #10867

Fix Callback Tensor Inputs of the SDXL Controlnet Inpaint and Img2img Pipelines are missing "controlnet_image". by @CyberVy in #10880

Security fix by @ydshieh in #10905

Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation by @toshas in #10884

[Tests] fix: lumina2 lora fuse_nan test by @sayakpaul in #10911

Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. by @CyberVy in #10907

[CI] Fix Fast GPU tests on PR by @DN6 in #10912

[CI] Fix for failing IP Adapter test in Fast GPU PR tests by @DN6 in #10915

Experimental per control type scale for ControlNet Union by @hlky in #10723

[style bot] improve security for the stylebot. by @sayakpaul in #10908

[CI] Update Stylebot Permissions by @DN6 in #10931

[Alibaba Wan Team] continue on #10921 Wan2.1 by @yiyixuxu in #10922

Support IPAdapter for more Flux pipelines by @hlky in #10708

Add remote_decode to remote_utils by @hlky in #10898

Update VAE Decode endpoints by @hlky in #10939

[chore] fix-copies to flux pipelines by @sayakpaul in #10941

[Tests] Remove more encode prompts tests by @sayakpaul in #10942

Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model by @bubbliiiing in #10626

Fix SD2.X clip single file load projection_dim by @Teriks in #10770

add from_single_file to animatediff by @ in #10924

Add Example of IPAdapterScaleCutoffCallback to Docs by @ParagEkbote in #10934

Update pipeline_cogview4.py by @zRzRzRzRzRzRzR in #10944

Fix redundant prev_output_channel assignment in UNet2DModel by @ahmedbelgacem in #10945

Improve load_ip_adapter RAM Usage by @CyberVy in #10948

[tests] make tests device-agnostic (part 4) by @faaany in #10508

Update evaluation.md by @sayakpaul in #10938

[LoRA] feat: support non-diffusers lumina2 LoRAs. by @sayakpaul in #10909

[Quantization] support pass MappingType for TorchAoConfig by @a120092009 in #10927

Fix the missing parentheses when calling is_torchao_available in quantization_config.py. by @CyberVy in #10961

[LoRA] Support Wan by @a-r-r-o-w in #10943

Fix incorrect seed initialization when args.seed is 0 by @azolotenkov in #10964

feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL by @elismasilva in #10951

[Docs] CogView4 comment fix by @zRzRzRzRzRzRzR in #10957

update check_input for cogview4 by @yiyixuxu in #10966

Add VAE Decode endpoint slow test by @hlky in #10946

[flux lora training] fix t5 training bug by @linoytsaban in #10845

use style bot GH Action from huggingface_hub by @hanouticelina in #10970

[train_dreambooth_lora.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @flyxiv in #10973

[tests] fix tests for save load components by @sayakpaul in #10977

Fix loading OneTrainer Flux LoRA by @hlky in #10978

fix default values of Flux guidance_scale in docstrings by @catwell in #10982

[CI] remove synchornized. by @sayakpaul in #10980

Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/realfill by @dependabot[bot] in #10984

Fix Flux Controlnet Pipeline _callback_tensor_inputs Missing Some Elements by @CyberVy in #10974

[Single File] Add user agent to SF download requests. by @DN6 in #10979

Add CogVideoX DDIM Inversion to Community Pipelines by @LittleNyima in #10956

fix wan i2v pipeline bugs by @yupeng1111 in #10975

Hunyuan I2V by @a-r-r-o-w in #10983

Fix Graph Breaks When Compiling CogView4 by @chengzeyi in #10959

Wan VAE move scaling to pipeline by @hlky in #10998

[LoRA] remove full key prefix from peft. by @sayakpaul in #11004

[Single File] Add single file support for Wan T2V/I2V by @DN6 in #10991

Add STG to community pipelines by @kinam0252 in #10960

[LoRA] Improve copied from comments in the LoRA loader classes by @sayakpaul in #10995

Fix for fetching variants only by @DN6 in #10646

[Quantization] Add Quanto backend by @DN6 in #10756

[Single File] Add single file loading for SANA Transformer by @ishan-modi in #10947

[LoRA] Improve warning messages when LoRA loading becomes a no-op by @sayakpaul in #10187

[LoRA] CogView4 by @a-r-r-o-w in #10981

[Tests] improve quantization tests by additionally measuring the inference memory savings by @sayakpaul in #11021

[Research Project] Add AnyText: Multilingual Visual Text Generation And Editing by @tolgacangoz in #8998

[Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 by @DN6 in #11018

fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings by @elismasilva in #11012

[LoRA] support wan i2v loras from the world. by @sayakpaul in #11025

Fix SD3 IPAdapter feature extractor by @hlky in #11027

chore: fix help messages in advanced diffusion examples by @wonderfan in #10923

Fix missing **kwargs in lora_pipeline.py by @CyberVy in #11011

Fix for multi-GPU WAN inference by @AmericanPresidentJimmyCarter in #10997

[Refactor] Clean up import utils boilerplate by @DN6 in #11026

Use output_size in repeat_interleave by @hlky in #11030

[hybrid inference 🍯🐝] Add VAE encode by @hlky in #11017

Wan Pipeline scaling fix, type hint warning, multi generator fix by @hlky in #11007

[LoRA] change to warning from info when notifying the users about a LoRA no-op by @sayakpaul in #11044

Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline by @hlky in #10827

making formatted_images initialization compact by @YanivDorGalron in #10801

Fix aclnnRepeatInterleaveIntWithDim error on NPU for get_1d_rotary_pos_embed by @ZhengKai91 in #10820

[Tests] restrict memory tests for quanto for certain schemes. by @sayakpaul in #11052

[LoRA] feat: support non-diffusers wan t2v loras. by @sayakpaul in #11059

[examples/controlnet/train_controlnet_sd3.py] Fixes #11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch by @andjoer in #11051

reverts accidental change that removes attn_mask in attn. Improves fl… by @entrpn in #11065

Fix deterministic issue when getting pipeline dtype and device by @dimitribarbot in #10696

[Tests] add requires peft decorator. by @sayakpaul in #11037

CogView4 Control Block by @zRzRzRzRzRzRzR in #10809

[CI] pin transformers version for benchmarking. by @sayakpaul in #11067

Fix Wan I2V Quality by @chengzeyi in #11087

LTX 0.9.5 by @a-r-r-o-w in #10968

make PR GPU tests conditioned on styling. by @sayakpaul in #11099

Group offloading improvements by @a-r-r-o-w in #11094

Fix pipeline_flux_controlnet.py by @co63oc in #11095

update readme instructions. by @entrpn in #11096

Resolve stride mismatch in UNet's ResNet to support Torch DDP by @jinc7461 in #11098

Fix Group offloading behaviour when using streams by @a-r-r-o-w in #11097

Quality options in export_to_video by @hlky in #11090

[CI] uninstall deps properly from pr gpu tests. by @sayakpaul in #11102

[BUG] Fix Autoencoderkl train script by @lavinal712 in #11113

[Wan LoRAs] make T2V LoRAs compatible with Wan I2V by @linoytsaban in #11107

[tests] enable bnb tests on xpu by @faaany in #11001

[fix bug] PixArt inference_steps=1 by @lawrence-cj in #11079

Flux with Remote Encode by @hlky in #11091

[tests] make cuda only tests device-agnostic by @faaany in #11058

Provide option to reduce CPU RAM usage in Group Offload by @DN6 in #11106

remove F.rms_norm for now by @yiyixuxu in #11126

Notebooks for Community Scripts-8 by @ParagEkbote in #11128

fix _callback_tensor_inputs of sd controlnet inpaint pipeline missing some elements by @CyberVy in #11073

[core] FasterCache by @a-r-r-o-w in #10163

add sana-sprint by @yiyixuxu in #11074

Don't override torch_dtype and don't use when quantization_config is set by @hlky in #11039

Update README and example code for AnyText usage by @tolgacangoz in #11028

Modify the implementation of retrieve_timesteps in CogView4-Control. by @zRzRzRzRzRzRzR in #11125

[fix SANA-Sprint] by @lawrence-cj in #11142

New HunyuanVideo-I2V by @a-r-r-o-w in #11066

[doc] Fix Korean Controlnet Train doc by @flyxiv in #11141

Improve information about group offloading and layerwise casting by @a-r-r-o-w in #11101

add a timestep scale for sana-sprint teacher model by @lawrence-cj in #11150

[Quantization] dtype fix for GGUF + fix BnB tests by @DN6 in #11159

Set self._hf_peft_config_loaded to True when LoRA is loaded using load_lora_adapter in PeftAdapterMixin class by @kentdan3msu in #11155

WanI2V encode_image by @hlky in #11164

[Docs] Update Wan Docs with memory optimizations by @DN6 in #11089

Fix LatteTransformer3DModel dtype mismatch with enable_temporal_attentions by @hlky in #11139

Raise warning and round down if Wan num_frames is not 4k + 1 by @a-r-r-o-w in #11167

[Docs] Fix environment variables in installation.md by @remarkablemark in #11179

Add latents_mean and latents_std to SDXLLongPromptWeightingPipeline by @hlky in #11034

Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is already set by @kakukakujirori in #10918

[tests] no hard-coded cuda by @faaany in #11186

[WIP] Add Wan Video2Video by @DN6 in #11053

map BACKEND_RESET_MAX_MEMORY_ALLOCATED to reset_peak_memory_stats on XPU by @yao-matrix in #11191

fix autocast by @jiqing-feng in #11190

fix: for checking mandatory and optional pipeline components by @elismasilva in #11189

remove unnecessary call to F.pad by @bm-synth in #10620

allow models to run with a user-provided dtype map instead of a single dtype by @hlky in #10301

[tests] HunyuanDiTControlNetPipeline inference precision issue on XPU by @faaany in #11197

Revert save_model in ModelMixin save_pretrained and use safe_serialization=False in test by @hlky in #11196

[docs] torch_dtype map by @hlky in #11194

Fix enable_sequential_cpu_offload in CogView4Pipeline by @hlky in #11195

SchedulerMixin from_pretrained and ConfigMixin Self type annotation by @hlky in #11192

Update import_utils.py by @Lakshaysharma048 in #10329

Add CacheMixin to Wan and LTX Transformers by @DN6 in #11187

feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline by @elismasilva in #11188

[Model Card] standardize advanced diffusion training sdxl lora by @chiral-carbon in #7615

Change KolorsPipeline LoRA Loader to StableDiffusion by @BasileLewan in #11198

Update Style Bot workflow by @hanouticelina in #11202

Fixed requests.get function call by adding timeout parameter. by @kghamilton89 in #11156

Fix Single File loading for LTX VAE by @DN6 in #11200

[feat]Add strength in flux_fill pipeline (denoising strength for fluxfill) by @Suprhimp in #10603

[LTX0.9.5] Refactor LTXConditionPipeline for text-only conditioning by @tolgacangoz in #11174

Add Wan with STG as a community pipeline by @Ednaordinary in #11184

Add missing MochiEncoder3D.gradient_checkpointing attribute by @mjkvaak-amd in #11146

enable 1 case on XPU by @yao-matrix in #11219

ensure dtype match between diffused latents and vae weights by @heyalexchoi in #8391

[docs] MPS update by @stevhliu in #11212

Add support to pass image embeddings to the WAN I2V pipeline. by @goiri in #11175

[train_controlnet.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env by @Bhavay-2001 in #8461

[Training] Better image interpolation in training scripts by @asomoza in #11206

[LoRA] Implement hot-swapping of LoRA by @BenjaminBossan in #9453

introduce compute arch specific expectations and fix test_sd3_img2img_inference failure by @yao-matrix in #11227

[Flux LoRA] fix issues in flux lora scripts by @linoytsaban in #11111

Flux quantized with lora by @hlky in #10990

[feat] implement record_stream when using CUDA streams during group offloading by @sayakpaul in #11081

[bistandbytes] improve replacement warnings for bnb by @sayakpaul in #11132

minor update to sana sprint docs. by @sayakpaul in #11236

[docs] minor updates to dtype map docs. by @sayakpaul in #11237

[LoRA] support more comyui loras for Flux 🚨 by @sayakpaul in #10985

fix: SD3 ControlNet validation so that it runs on a A100. by @sayakpaul in #11238

AudioLDM2 Fixes by @hlky in #11244

AutoModel by @hlky in #11115

fix FluxReduxSlowTests::test_flux_redux_inference case failure on XPU by @yao-matrix in #11245

[docs] AutoModel by @hlky in #11250

Update Ruff to latest Version by @DN6 in #10919

fix flux controlnet bug by @free001style in #11152

fix timeout constant by @sayakpaul in #11252

fix consisid imports by @sayakpaul in #11254

Release: v0.33.0 by @sayakpaul (direct commit on v0.33.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@guiyrt

IP-Adapter for StableDiffusion3Img2ImgPipeline (#10589)

[Docs] Update SD3 ip_adapter model_id to diffusers checkpoint (#10597)

MultiControlNetUnionModel on SDXL (#10747)

SD3 IP-Adapter runtime checkpoint conversion (#10718)

Comprehensive type checking for from_pretrained kwargs (#10758)

Multi IP-Adapter for Flux pipelines (#10867)

@chengzeyi

[Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo (#10544)

Fix Graph Breaks When Compiling CogView4 (#10959)

Fix Wan I2V Quality (#11087)

@entrpn

implementing flux on TPUs with ptxla (#10515)

reverts accidental change that removes attn_mask in attn. Improves fl… (#11065)

update readme instructions. (#11096)

@SHYuanBest

[core] ConsisID (#10140)

@faaany

[tests] make tests device-agnostic (part 3) (#10437)

make tensors contiguous before passing to safetensors (#10761)

[tests] make tests device-agnostic (part 4) (#10508)

[tests] enable bnb tests on xpu (#11001)

[tests] make cuda only tests device-agnostic (#11058)

[tests] no hard-coded cuda (#11186)

[tests] HunyuanDiTControlNetPipeline inference precision issue on XPU (#11197)

@yiyixuxu

fix offload gpu tests etc (#10366)

follow-up refactor on lumina2 (#10776)

[Alibaba Wan Team] continue on #10921 Wan2.1 (#10922)

update check_input for cogview4 (#10966)

remove F.rms_norm for now (#11126)

add sana-sprint (#11074)

@DN6

[CI] Update HF_TOKEN in all workflows (#10613)

[CI] Fix Truffle Hog failure (#10769)

[Single File] Add Single File support for Lumina Image 2.0 Transformer (#10781)

[CI] Fix incorrectly named test module for Hunyuan DiT (#10854)

[CI] Update always test Pipelines list in Pipeline fetcher (#10856)

[Docs] Fix toctree sorting (#10894)

[CI] Improvements to conditional GPU PR tests (#10859)

[CI] Fix Fast GPU tests on PR (#10912)

[CI] Fix for failing IP Adapter test in Fast GPU PR tests (#10915)

[CI] Update Stylebot Permissions (#10931)

[Single File] Add user agent to SF download requests. (#10979)

[Single File] Add single file support for Wan T2V/I2V (#10991)

Fix for fetching variants only (#10646)

[Quantization] Add Quanto backend (#10756)

[Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 (#11018)

[Refactor] Clean up import utils boilerplate (#11026)

Provide option to reduce CPU RAM usage in Group Offload (#11106)

[Quantization] dtype fix for GGUF + fix BnB tests (#11159)

[Docs] Update Wan Docs with memory optimizations (#11089)

[WIP] Add Wan Video2Video (#11053)

Add CacheMixin to Wan and LTX Transformers (#11187)

Fix Single File loading for LTX VAE (#11200)

Update Ruff to latest Version (#10919)

@Anonym0u3

Add pipeline_stable_diffusion_xl_attentive_eraser (#10579)

@lavinal712

create a script to train autoencoderkl (#10605)

[BUG] Fix Autoencoderkl train script (#11113)

@Marlon154

Add community pipeline for semantic guidance for FLUX (#10610)

@ParagEkbote

Fix Documentation about Image-to-Image Pipeline (#10704)

Notebooks for Community Scripts-6 (#10713)

Extend Support for callback_on_step_end for AuraFlow and LuminaText2Img Pipelines (#10746)

Notebooks for Community Scripts-7 (#10846)

Add Example of IPAdapterScaleCutoffCallback to Docs (#10934)

Notebooks for Community Scripts-8 (#11128)

@suzukimain

[Community] Enhanced Model Search (#10417)

@staoxiao

Add OmniGen (#10148)

@elismasilva

feat: new community mixture_tiling_sdxl pipeline for SDXL (#10759)

fix: [Community pipeline] Fix flattened elements on image (#10774)

feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL (#10951)

fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings (#11012)

fix: for checking mandatory and optional pipeline components (#11189)

feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline (#11188)

@zhuole1025

Add support for lumina2 (#10642)

@zRzRzRzRzRzRzR

CogView4 (supports different length c and uc) (#10649)

Update pipeline_cogview4.py (#10944)

[Docs] CogView4 comment fix (#10957)

CogView4 Control Block (#10809)

Modify the implementation of retrieve_timesteps in CogView4-Control. (#11125)

@toshas

Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation (#10884)

@bubbliiiing

Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model (#10626)

@LittleNyima

Add CogVideoX DDIM Inversion to Community Pipelines (#10956)

@kinam0252

Add STG to community pipelines (#10960)

@tolgacangoz

[Research Project] Add AnyText: Multilingual Visual Text Generation And Editing (#8998)

Update README and example code for AnyText usage (#11028)

[LTX0.9.5] Refactor LTXConditionPipeline for text-only conditioning (#11174)

@Ednaordinary

Add Wan with STG as a community pipeline (#11184)

Original source
Jan 15, 2025
- Date parsed from source:
  Jan 15, 2025
- First seen by Releasebot:
  Mar 20, 2026
diffusers by Hugging Face

v0.32.2

diffusers releases a patch update that fixes Flux single-file checkpoint loading, improves LoRA support for 4bit quantized Flux and Hunyuan Video, adds unload_lora_weights for Flux Control, and resolves a Hunyuan Video batch size bug.
Fixes for Flux Single File loading, LoRA loading for 4bit BnB Flux, Hunyuan Video

This patch release

Fixes a regression in loading Comfy UI format single file checkpoints for Flux
Fixes a regression in loading LoRAs with bitsandbytes 4bit quantized Flux models
Adds unload_lora_weights for Flux Control
Fixes a bug that prevents Hunyuan Video from running with batch size > 1
Allow Hunyuan Video to load LoRAs created from the original repository code

All commits

[Single File] Fix loading Flux Dev finetunes with Comfy Prefix by @DN6 in #10545

[CI] Update HF Token on Fast GPU Model Tests by @DN6 #10570

[CI] Update HF Token in Fast GPU Tests by @DN6 #10568

Fix batch > 1 in HunyuanVideo by @hlky in #10548

Fix HunyuanVideo produces NaN on PyTorch<2.5 by @hlky in #10482

Fix hunyuan video attention mask dim by @a-r-r-o-w in #10454

[LoRA] Support original format loras for HunyuanVideo by @a-r-r-o-w in #10376

[LoRA] feat: support loading loras into 4bit quantized Flux models. by @sayakpaul in #10578

[LoRA] clean up load_lora_into_text_encoder() and fuse_lora() copied from by @sayakpaul in #10495

[LoRA] feat: support unload_lora_weights() for Flux Control. by @sayakpaul in #10206

Fix Flux multiple Lora loading bug by @maxs-kan in #10388

[LoRA] fix: lora unloading when using expanded Flux LoRAs. by @sayakpaul in #10397

Original source
Dec 25, 2024
- Date parsed from source:
  Dec 25, 2024
- First seen by Releasebot:
  Mar 20, 2026
diffusers by Hugging Face

v0.32.1

diffusers fixes TorchAO quantizer bugs, resolving import issues on older PyTorch versions, correcting quantization behavior, and tightening device map handling for better-tested releases.
TorchAO Quantizer fixes

This patch release fixes a few bugs related to the TorchAO Quantizer introduced in v0.32.0.

Importing Diffusers would raise an error in PyTorch versions lower than 2.3.0. This should no longer be a problem.

Device Map does not work as expected when using the quantizer. We now raise an error if it is used. Support for using device maps with different quantization backends will be added in the near future.

Quantization was not performed due to faulty logic. This is now fixed and better tested.

Refer to our documentation to learn more about how to use different quantization backends.

All commits

make style for #10368 by @yiyixuxu in #10370

fix test pypi installation in the release workflow by @sayakpaul in #10360

Fix TorchAO related bugs; revert device_map changes by @a-r-r-o-w in #10371

Original source

This is the end. You've seen all the release notes in this feed!

diffusers Updates & Release Notes

Diffusers 0.38.0: New image and audio pipelines, Core library improvements, and more

New Pipelines

LLaDA2

Nucleus-MoE

Ernie-Image

LongCat-AudioDiT

Ace-Step 1.5

Flux.2 Small Decoder

Modular Pipeline Support

Core Library

All commits

Significant community contributions

Fixes for AutoModel type hints in Modular Pipelines and Flux Klein LoRA loading

Diffusers 0.37.0: Modular Diffusers, New image and video pipelines, multiple core library improvements, and more 🔥

Modular Diffusers

New Pipelines and Models

Image 🌆

Video + audio 🎥 🎼

Improvements to Core Library

New caching methods

New context-parallelism (CP) backends

Misc

Bug Fixes

All commits

Significant community contributions

@delmalih

@yiyixuxu

@sayakpaul

Release: v0.37.0-release

@DN6

@naykun

@junqiangwu

@hlky

@miguelmartin75

@RuoyiDu

@r4inm4ker

@yaoqih

@dg845

@kashif

@bhavya01

@linoytsaban

@stevhliu

@hameerabbasi

@galbria

@JaredforReal

@rootonchair

@AlanPonnachan

@CalamitousFelicitousness

@Ando233

Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more 🎄

New image pipelines

New video pipelines

New kernels-powered attention backends

TaylorSeer cache

New training script

Misc

All commits

Significant community contributions

@yiyixuxu

@leffff

@dg845

@DN6

@DavidBert

@galbria

@lawrence-cj

@zhangjiewu

@delmalih

@pratim4dasude

@JerryWu-code

@CalamitousFelicitousness

@DoctorKey

🐞 fixes for `transformers` models, imports,

All commits

Release: v0.35.1-patch by @sayakpaul (direct commit on v0.35.2-patch)

Release: v0.35.2-patch by @sayakpaul (direct commit on v0.35.2-patch)

v0.35.1 for improvements in Qwen-Image Edit

Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more

New pipelines 🧨

Wan 2.2 📹