diffusers Release Notes
Last updated: Mar 20, 2026
- Mar 5, 2026
- Date parsed from source:Mar 5, 2026
- First seen by Releasebot:Mar 20, 2026
Diffusers 0.37.0: Modular Diffusers, New image and video pipelines, multiple core library improvements, and more 🔥
diffusers releases Modular Diffusers for building pipelines from reusable blocks, while expanding image, video, and audio generation with new models like Z-Image, Flux2 Klein, Qwen Image Layered, LTX-2, and Helios. It also adds core caching, context parallelism, and broad bug fixes.
Modular Diffusers
Modular Diffusers introduces a new way to build diffusion pipelines by composing reusable blocks. Instead of writing entire pipelines from scratch, you can now mix and match building blocks to create custom workflows tailored to your specific needs! This complements the existing DiffusionPipeline class, providing a more flexible way to create custom diffusion pipelines.
Find more details on how to get started with Modular Diffusers here, and also check out the announcement post.
New Pipelines and Models
Image 🌆
- Z Image Omni Base: Z-Image is the foundation model of the Z-Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence. While Z-Image-Turbo is built for speed, Z-Image is a full-capacity, undistilled transformer designed to be the backbone for creators, researchers, and developers who require the highest level of creative freedom. Thanks to @RuoyiDufor for contributing this in #12857.
- Flux2 Klein:FLUX.2 [Klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second. Built for applications that require real-time image generation without sacrificing quality, and runs on consumer hardware, with as little as 13GB VRAM.
- Qwen Image Layered: Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. This layered representation unlocks inherent editability: each layer can be independently manipulated without affecting other content. Thanks to @naykun for contributing this in #12853.
- FIBO Edit: Fibo Edit is an 8B parameter image-to-image model that introduces a new paradigm of structured control, operating on JSON inputs paired with source images to enable deterministic and repeatable editing workflows. Featuring native masking for granular precision, it moves beyond simple prompt-based diffusion to offer explicit, interpretable control optimized for production environments. Its lightweight architecture is designed for deep customization, empowering researchers to build specialized “Edit” models for domain-specific tasks while delivering top-tier aesthetic quality. Thanks galbria for contributing it in #12930.
- Cosmos Predict2.5: Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world. Thanks to @miguelmartin75 for contributing it in #12852.
- Cosmos Transfer2.5: Cosmos-Transfer2.5 is a conditional world generation model with adaptive multimodal control, that produces high-quality world simulations conditioned on multiple control inputs. These inputs can take different modalities—including edges, blurred video, segmentation maps, and depth maps. Thanks to @miguelmartin75 for contributing it in #13066.
- GLM-Image: GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture, effectively pushing the upper bound of visual fidelity and fine-grained details. In general image generation quality, it aligns with industry-standard LDM-based approaches, while demonstrating significant advantages in knowledge-intensive image generation scenarios. Thanks to @zRzRzRzRzRzRzR for contributing it in #12973.
- RAE: Representation Autoencoders (aka RAE) are an exciting alternative to traditional VAEs, typically used in the area of latent-space diffusion models of image generation. RAEs leverage pre-trained vision encoders and train lightweight decoders for the task of reconstruction.
Video + audio 🎥 🎼
- LTX-2: LTX-2 is an audio-conditioned text-to-video generation model that can generate videos with synced audio. Full and distilled model inference, as well as two-stage inference with spatial sampling, is supported. We also support a conditioning pipeline that allows for passing different conditions (such as images, series of images, etc.). Check out the docs to learn more!
- Helios: Helios is a 14B video generation model that runs at 17 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching a strong baseline in quality. Thanks to @SHYuanBest for contributing this in #13208.
Improvements to Core Library
New caching methods
- MagCache — thanks to @AlanPonnachan!
- TaylorSeer — thanks to @toilaluan!
New context-parallelism (CP) backends
- Unified Sequence Parallel attention — thanks to @Bissmella!
- Ulysses Anything Attention — thanks to @DefTruth!
Misc
- Mambo-G Guidance: New guider implementation (#12862)
- Laplace Scheduler for DDPM (#11320)
- Custom Sigmas in UniPCMultistepScheduler (#12109)
- MultiControlNet support for SD3 Inpainting (#11251)
- Context parallel in native flash attention (#12829)
- NPU Ulysses Attention Support (#12919)
- Fix Wan 2.1 I2V Context Parallel Inference (#12909)
- Fix Qwen-Image Context Parallel Inference (#12970)
- Introduction to @apply_lora_scale decorator for simplifying model definitions (#12994)
- Introduction of pipeline-level “cpu” device_map (#12811)
- Enable CP for kernels-based attention backends (#12812)
- Diffusers is fully functional with Transformers V5 (#12976)
- A lot of the above features/improvements came as part of the MVP program we have been running. Immense thanks to the contributors!
Bug Fixes
- Fix QwenImageEditPlus on NPU (#13017)
- Fix MT5Tokenizer → use T5Tokenizer for Transformers v5.0+ compatibility (#12877)
- Fix Wan/WanI2V patchification (#13038)
- Fix LTX-2 inference with num_videos_per_prompt > 1 and CFG (#13121)
- Fix Flux2 img2img prediction (#12855)
- Fix QwenImage txt_seq_lens handling (#12702)
- Fix prefix_token_len bug (#12845)
- Fix ftfy imports in Wan and SkyReels-V2 (#12314, #13113)
- Fix is_fsdp determination (#12960)
- Fix GLM-Image get_image_features API (#13052)
- Fix Wan 2.2 when either transformer isn't present (#13055)
- Fix guider issue (#13147)
- Fix torchao quantizer for new versions (#12901)
- Fix GGUF for unquantized types with unquantize kernels (#12498)
- Make Qwen hidden states contiguous for torchao (#13081)
- Make Flux hidden states contiguous (#13068)
- Fix Kandinsky 5 hardcoded CUDA autocast (#12814)
- Fix aiter availability check (#13059)
- Fix attention mask check for unsupported backends (#12892)
- Allow prompt and prior_token_ids simultaneously in GlmImagePipeline (#13092)
- GLM-Image batch support (#13007)
- Cosmos 2.5 Video2World frame extraction fix (#13018)
- ResNet: only use contiguous in training mode (#12977)
All commits
- [PRX] Improve model compilation by @WaterKnight1998 in #12787
- Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py by @delmalih in #12798
- [Modular]z-image by @yiyixuxu in #12808
- Fix Qwen Edit Plus modular for multi-image input by @sayakpaul in #12601
- [WIP] Add Flux2 modular by @DN6 in #12763
- [docs] improve distributed inference cp docs. by @sayakpaul in #12810
- post release 0.36.0 by @sayakpaul in #12804
- Update distributed_inference.md to correct syntax by @sayakpaul in #12827
- [lora] Remove lora docs unneeded and add " # Copied from ..." by @sayakpaul in #12824
- support CP in native flash attention by @sywangyi in #12829
- [qwen-image] edit 2511 support by @naykun in #12839
- fix pytest tests/pipelines/pixart_sigma/test_pixart.py::PixArtSigmaPi… by @sywangyi in #12842
- Support for control-lora by @lavinal712 in #10686
- Add support for LongCat-Image by @junqiangwu in #12828
- fix the prefix_token_len bug by @junqiangwu in #12845
- extend TorchAoTest::test_model_memory_usage to other platform by @sywangyi in #12768
- Qwen Image Layered Support by @naykun in #12853
- Z-Image-Turbo ControlNet by @hlky in #12792
- Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion by @miguelmartin75 in #12852
- more update in modular by @yiyixuxu in #12560
- Feature: Add Mambo-G Guidance as Guider by @MatrixTeam-AI in #12862
- Add OvisImagePipeline in AUTO_TEXT2IMAGE_PIPELINES_MAPPING by @alvarobartt in #12876
- Cosmos Predict2.5 14b Conversion by @miguelmartin75 in #12863
- Use T5Tokenizer instead of MT5Tokenizer (removed in Transformers v5.0+) by @alvarobartt in #12877
- Add z-image-omni-base implementation by @RuoyiDu in #12857
- fix torchao quantizer for new torchao versions by @vkuzo in #12901
- fix Qwen Image Transformer single file loading mapping function to be consistent with other loader APIs by @mbalabanski in #12894
- Z-Image-Turbo from_single_file fix by @hlky in #12888
- chore: fix dev version in setup.py by @DefTruth in #12904
- Community Pipeline: Add z-image differential img2img by @r4inm4ker in #12882
- Fix typo in src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py by @miguelmartin75 in #12914
- Fix wan 2.1 i2v context parallel by @DefTruth in #12909
- fix the use of device_map in CP docs by @sayakpaul in #12902
- [core] remove unneeded autoencoder methods when subclassing from AutoencoderMixin by @sayakpaul in #12873
- Detect 2.0 vs 2.1 ZImageControlNetModel by @hlky in #12861
- Refactor environment variable assignments in workflow by @paulinebm in #12916
- Add codeQL workflow by @paulinebm in #12917
- Delete .github/workflows/codeql.yml by @paulinebm (direct commit on v0.37.0-release)
- CodeQL workflow for security analysis by @paulinebm (direct commit on v0.37.0-release)
- Check for attention mask in backends that don't support it by @dxqb in #12892
- [Flux.1] improve pos embed for ascend npu by computing on npu by @zhangtao0408 in #12897
- LTX Video 0.9.8 long multi prompt by @yaoqih in #12614
- Add FSDP option for Flux2 by @leisuzz in #12860
- Add transformer cache context for SkyReels-V2 pipelines & Update docs by @tolgacangoz in #12837
- [docs] fix torchao typo. by @sayakpaul in #12883
- Update wan.md to remove unneeded hfoptions by @sayakpaul in #12890
- Improve docstrings and type hints in scheduling_edm_euler.py by @delmalih in #12871
- [Modular] Video for Mellon by @asomoza in #12924
- Add LTX 2.0 Video Pipelines by @dg845 in #12915
- Add environment variables to checkout step by @paulinebm in #12927
- Improve docstrings and type hints in scheduling_consistency_decoder.py by @delmalih in #12928
- Fix: Remove hardcoded CUDA autocast in Kandinsky 5 to fix import warning by @adi776borate in #12814
- Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #12865
- fix the warning torch_dtype is deprecated by @msdsm in #12841
- [NPU] npu attention enable ulysses by @TmacAaron in #12919
- Torchao floatx version guard by @howardzhang-cv in #12923
- Bugfix for dreambooth flux2 img2img2 by @leisuzz in #12825
- [Modular] qwen refactor by @yiyixuxu in #12872
- [modular] Tests for custom blocks in modular diffusers by @sayakpaul in #12557
- [chore] remove controlnet implementations outside controlnet module. by @sayakpaul in #12152
- [core] Handle progress bar and logging in distributed environments by @sayakpaul in #12806
- Improve docstrings and type hints in scheduling_consistency_models.py by @delmalih in #12931
- [Feature] MultiControlNet support for SD3Impainting by @ishan-modi in #11251
- Laplace Scheduler for DDPM by @gapatron in #11320
- Store vae.config.scaling_factor to prevent missing attr reference (sdxl advanced dreambooth training script) by @Teriks in #12346
- Add thread-safe wrappers for components in pipeline (examples/server-async/utils/requestscopedpipeline.py) by @FredyRivera-dev in #12515
- [Research] Latent Perceptual Loss (LPL) for Stable Diffusion XL by @kashif in #11573
- Change timestep device to cpu for xla by @bhavya01 in #11501
- [LoRA] add lora_alpha to sana README by @linoytsaban in #11780
- Fix wrong param types, docs, and handles noise=None in scale_noise of FlowMatching schedulers by @Promisery in #11669
- [docs] Remote inference by @stevhliu in #12372
- Align HunyuanVideoConditionEmbedding with CombinedTimestepGuidanceTextProjEmbeddings by @samutamm in #12316
- [Fix] syntax in QwenImageEditPlusPipeline by @SahilCarterr in #12371
- Fix ftfy name error in Wan pipeline by @dsocek in #12314
- [modular] error early in enable_auto_cpu_offload by @sayakpaul in #12578
- [ChronoEdit] support multiple loras by @zhangjiewu in #12679
- fix how is_fsdp is determined by @sayakpaul in #12960
- [LoRA] add LoRA support to LTX-2 by @sayakpaul in #12933
- Fix: typo in autoencoder_dc.py by @tvelovraf in #12687
- [Modular] better docstring by @yiyixuxu in #12932
- [docs] polish caching docs. by @sayakpaul in #12684
- Fix typos by @omahs in #12705
- Fix link to diffedit implementation reference by @JuanFKurucz in #12708
- Fix QwenImage txt_seq_lens handling by @kashif in #12702
- Bugfix for flux2 img2img2 prediction by @leisuzz in #12855
- Add Flag to PeftLoraLoaderMixinTests to Enable/Disable Text Encoder LoRA Tests by @dg845 in #12962
- Add Unified Sequence Parallel attention by @Bissmella in #12693
- [Modular] Changes for using WAN I2V by @asomoza in #12959
- Z rz rz rz rz rz rz r cogview by @sayakpaul in #12973
- Update distributed_inference.md to reposition sections by @sayakpaul in #12971
- [chore] make transformers version check stricter for glm image. by @sayakpaul in #12974
- Remove 8bit device restriction by @SunMarc in #12972
- disable_mmap in pipeline from_pretrained by @hlky in #12854
- [Modular] mellon utils by @yiyixuxu in #12978
- LongCat Image pipeline: Allow offloading/quantization of text_encoder component by @Yahweasel in #12963
- Add ChromaInpaintPipeline by @hameerabbasi in #12848
- fix Qwen-Image series context parallel by @DefTruth in #12970
- Flux2 klein by @yiyixuxu in #12982
- [modular] fix a bug in mellon param & improve docstrings by @yiyixuxu in #12980
- add klein docs. by @sayakpaul in #12984
- LTX 2 Single File Support by @dg845 in #12983
- [core] gracefully error out when attn-backend x cp combo isn't supported. by @sayakpaul in #12832
- Improve docstrings and type hints in scheduling_cosine_dpmsolver_multistep.py by @delmalih in #12936
- [Docs] Replace root CONTRIBUTING.md with symlink to source docs by @delmalih in #12986
- make style && make quality by @sayakpaul (direct commit on v0.37.0-release)
- Revert "make style && make quality" by @sayakpaul (direct commit on v0.37.0-release)
- [chore] make style to push new changes. by @sayakpaul in #12998
- Fibo edit pipeline by @galbria in #12930
- Fix variable name in docstring for PeftAdapterMixin.set_adapters by @geekuillaume in #13003
- Improve docstrings and type hints in scheduling_ddim_cogvideox.py by @delmalih in #12992
- [scheduler] Support custom sigmas in UniPCMultistepScheduler by @a-r-r-o-w in #12109
- feat: accelerate longcat-image with regional compile by @lgyStoic in #13019
- Improve docstrings and type hints in scheduling_ddim_flax.py by @delmalih in #13010
- Improve docstrings and type hints in scheduling_ddim_inverse.py by @delmalih in #13020
- fix Dockerfiles for cuda and xformers. by @sayakpaul in #13022
- Resnet only use contiguous in training mode. by @jiqing-feng in #12977
- feat: add qkv projection fuse for longcat transformers by @lgyStoic in #13021
- Improve docstrings and type hints in scheduling_ddim_parallel.py by @delmalih in #13023
- Improve docstrings and type hints in scheduling_ddpm_flax.py by @delmalih in #13024
- Improve docstrings and type hints in scheduling_ddpm_parallel.py by @delmalih in #13027
- Remove pooled_ mentions from Chroma inpaint by @hameerabbasi in #13026
- Flag Flax schedulers as deprecated by @delmalih in #13031
- [modular] add auto_docstring & more doc related refactors by @yiyixuxu in #12958
- Upgrade GitHub Actions to latest versions by @salmanmkc in #12866
- [From Single File] support from_single_file method for WanAnimateTransformer3DModel by @samadwar in #12691
- Fix: Cosmos2.5 Video2World frame extraction and add default negative prompt by @adi776borate in #13018
- [GLM-Image] Add batch support for GlmImagePipeline by @JaredforReal in #13007
- [Qwen] avoid creating attention masks when there is no padding by @kashif in #12987
- [modular]support klein by @yiyixuxu in #13002
- [QwenImage] fix prompt isolation tests by @sayakpaul in #13042
- fast tok update by @itazap in #13036
- change to CUDA 12.9. by @sayakpaul in #13045
- remove torchao autoquant from diffusers docs by @vkuzo in #13048
- docs: improve docstring scheduling_dpm_cogvideox.py by @delmalih in #13044
- Fix Wan/WanI2V patchification by @Jayce-Ping in #13038
- LTX2 distilled checkpoint support by @rootonchair in #12934
- [wan] fix layerwise upcasting tests on CPU by @sayakpaul in #13039
- [ci] uniform run times and wheels for pytorch cuda. by @sayakpaul in #13047
- docs: fix grammar in fp16_safetensors CLI warning by @Olexandr88 in #13040
- [wan] fix wan 2.2 when either of the transformers isn't present. by @sayakpaul in #13055
- [bug fix] GLM-Image fit new get_image_features API by @JaredforReal in #13052
- Fix aiter availability check by @lauri9 in #13059
- [Modular]add a real quick start guide by @yiyixuxu in #13029
- feat: support Ulysses Anything Attention by @DefTruth in #12996
- Refactor Model Tests by @DN6 in #12822
- [Flux2] Fix LoRA loading for Flux2 Klein by adaptively enumerating transformer blocks by @songkey in #13030
- [Modular] loader related by @yiyixuxu in #13025
- [Modular] mellon doc etc by @yiyixuxu in #13051
- [modular] change the template modular pipeline card by @sayakpaul in #13072
- Add support for Magcache by @AlanPonnachan in #12744
- [docs] Fix syntax error in quantization configuration by @sayakpaul in #13076
- docs: improve docstring scheduling_dpmsolver_multistep_inverse.py by @delmalih in #13083
- [core] make flux hidden states contiguous by @sayakpaul in #13068
- [core] make qwen hidden states contiguous to make torchao happy. by @sayakpaul in #13081
- Feature/zimage inpaint pipeline by @CalamitousFelicitousness in #13006
- GGUF fix for unquantized types when using unquantize kernels by @dxqb in #12498
- docs: improve docstring scheduling_dpmsolver_multistep_inverse.py by @delmalih in #13085
- [modular]simplify components manager doc by @yiyixuxu in #13088
- ZImageControlNet cfg by @hlky in #13080
- [Modular] refactor Wan: modular pipelines by task etc by @yiyixuxu in #13063
- [Modular] guard ModularPipeline.blocks attribute by @yiyixuxu in #13014
- LTX 2 Improve encode_video by Accepting More Input Types by @dg845 in #13057
- Z image lora training by @linoytsaban in #13056
- [modular] add modular tests for Z-Image and Wan by @sayakpaul in #13078
- [Docs] Add guide for AutoModel with custom code by @DN6 in #13099
- [SkyReelsV2] Fix ftfy import by @asomoza in #13113
- [lora] fix non-diffusers lora key handling for flux2 by @sayakpaul in #13119
- [CI] Refactor Wan Model Tests by @DN6 in #13082
- docs: improve docstring scheduling_edm_dpmsolver_multistep.py by @delmalih in #13122
- [Fix]Allow prompt and prior_token_ids to be provided simultaneously in GlmImagePipeline by @JaredforReal in #13092
- docs: improve docstring scheduling_flow_match_euler_discrete.py by @delmalih in #13127
- Cosmos Transfer2.5 inference pipeline: general/{seg, depth, blur, edge} by @miguelmartin75 in #13066
- [modular] add tests for robust model loading. by @sayakpaul in #13120
- Fix LTX-2 Inference when num_videos_per_prompt > 1 and CFG is Enabled by @dg845 in #13121
- [CI] Fix setuptools pkg_resources Errors by @dg845 in #13129
- docs: improve docstring scheduling_flow_match_heun_discrete.py by @delmalih in #13130
- [CI] Fix setuptools pkg_resources Bug for PR GPU Tests by @dg845 in #13132
- fix cosmos transformer typing. by @sayakpaul in #13134
- Sunset Python 3.8 & get rid of explicit typing exports where possible by @sayakpaul in #12524
- feat: implement apply_lora_scale to remove boilerplate. by @sayakpaul in #12994
- [docs] fix ltx2 i2v docstring. by @sayakpaul in #13135
- [Modular] add different pipeine blocks to init by @yiyixuxu in #13145
- fix MT5Tokenizer by @yiyixuxu in #13146
- fix guider by @yiyixuxu in #13147
- [Modular] update doc for ModularPipeline by @yiyixuxu in #13100
- [Modular] add explicit workflow support by @yiyixuxu in #13028
- [LTX2] Fix wrong lora mixin by @asomoza in #13144
- [Pipelines] Remove k-diffusion by @DN6 in #13152
- [tests] accept recompile_limit from the user in tests by @sayakpaul in #13150
- [core] support device type device_maps to work with offloading. by @sayakpaul in #12811
- [Bug] Fix QwenImageEditPlus Series on NPU by @zhangtao0408 in #13017
- [CI] Add ftfy as a test dependency by @DN6 in #13155
- docs: improve docstring scheduling_flow_match_lcm.py by @delmalih in #13160
- [docs] add docs for qwenimagelayered by @stevhliu in #13158
- Flux2: Tensor tuples can cause issues for checkpointing by @dxqb in #12777
- [CI] Revert setuptools CI Fix as the Failing Pipelines are Deprecated by @dg845 in #13149
- Fix ftfy import for PRX Pipeline by @dg845 in #13154
- [core] Enable CP for kernels-based attention backends by @sayakpaul in #12812
- remove deps related to test from ci by @sayakpaul in #13164
- [CI] Fix new LoRAHotswap tests by @DN6 in #13163
- [gguf][torch.compile time] Convert to plain tensor earlier in dequantize_gguf_tensor by @anijain2305 in #13166
- Support Flux Klein peft (fal) lora format by @asomoza in #13169
- Fix T5GemmaEncoder loading for transformers 5.x composite T5GemmaConfig by @DavidBert in #13143
- Allow Automodel to use from_config with custom code. by @DN6 in #13123
- Fix AutoModel typing Import Error by @dg845 in #13178
- migrate to transformers v5 by @sayakpaul in #12976
- fix: graceful fallback when attention backends fail to import by @sym-bot in #13060
- [docs] Fix torchrun command argument order in docs by @sayakpaul in #13181
- [attention backends] use dedicated wrappers from fa3 for cp. by @sayakpaul in #13165
- Cosmos Transfer2.5 Auto-Regressive Inference Pipeline by @miguelmartin75 in #13114
- Fix wrong do_classifier_free_guidance threshold in ZImagePipeline by @kirillsst in #13183
- Fix Flash Attention 3 interface for new FA3 return format by @veeceey in #13173
- Fix LTX-2 image-to-video generation failure in two stages generation by @Songrui625 in #13187
- Fixing Kohya loras loading: Flux.1-dev loras with TE ("lora_te1_" prefix) by @christopher5106 in #13188
- [Modular] update the auto pipeline blocks doc by @yiyixuxu in #13148
- [tests] consistency tests for modular index by @sayakpaul in #13192
- [modular] fallback to default_blocks_name when loading base block classes in ModularPipeline by @yiyixuxu in #13193
- [chore] updates in the pypi publication workflow. by @sayakpaul in #12805
- [tests] enable cpu offload test in torchao without compilation. by @sayakpaul in #12704
- remove db utils from benchmarking by @sayakpaul in #13199
- [AutoModel] Fix bug with subfolders and local model paths when loading custom code by @DN6 in #13197
- [AutoModel] Allow registering auto_map to model config by @DN6 in #13186
- [Modular] Save Modular Pipeline weights to Hub by @DN6 in #13168
- docs: improve docstring scheduling_ipndm.py by @delmalih in #13198
- Clean up accidental files by @DN6 in #13202
- [modular]Update model card to include workflow by @yiyixuxu in #13195
- [modular] not pass trust_remote_code to external repos by @yiyixuxu in #13204
- [Modular] implement requirements validation for custom blocks by @sayakpaul in #12196
- cogvideo example: Distribute VAE video encoding across processes in CogVideoX LoRA training by @jiqing-feng in #13207
- Fix group-offloading bug by @SHYuanBest in #13211
- Add Helios-14B Video Generation Pipelines by @dg845 in #13208
- [Z-Image] Fix more do_classifier_free_guidance thresholds by @asomoza in #13212
- [lora] fix zimage lora conversion to support for more lora. by @sayakpaul in #13209
- adding lora support to z-image controlnet pipelines by @christopher5106 in #13200
- Add LTX2 Condition Pipeline by @dg845 in #13058
- Fix Helios paper link in documentation by @SHYuanBest in #13213
- [attention backends] change to updated repo and version. by @sayakpaul in #13161
- feat: implement rae autoencoder. by @Ando233 in #13046
- Release: v0.37.0-release by @sayakpaul (direct commit on v0.37.0-release)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
@delmalih
- Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py (#12798)
- Improve docstrings and type hints in scheduling_edm_euler.py (#12871)
- Improve docstrings and type hints in scheduling_consistency_decoder.py (#12928)
- Improve docstrings and type hints in scheduling_consistency_models.py (#12931)
- Improve docstrings and type hints in scheduling_cosine_dpmsolver_multistep.py (#12936)
- [Docs] Replace root CONTRIBUTING.md with symlink to source docs (#12986)
- Improve docstrings and type hints in scheduling_ddim_cogvideox.py (#12992)
- Improve docstrings and type hints in scheduling_ddim_flax.py (#13010)
- Improve docstrings and type hints in scheduling_ddim_inverse.py (#13020)
- Improve docstrings and type hints in scheduling_ddim_parallel.py (#13023)
- Improve docstrings and type hints in scheduling_ddpm_flax.py (#13024)
- Improve docstrings and type hints in scheduling_ddpm_parallel.py (#13027)
- Flag Flax schedulers as deprecated (#13031)
- docs: improve docstring scheduling_dpm_cogvideox.py (#13044)
- docs: improve docstring scheduling_dpmsolver_multistep_inverse.py (#13083)
- docs: improve docstring scheduling_dpmsolver_multistep_inverse.py (#13085)
- docs: improve docstring scheduling_edm_dpmsolver_multistep.py (#13122)
- docs: improve docstring scheduling_flow_match_euler_discrete.py (#13127)
- docs: improve docstring scheduling_flow_match_heun_discrete.py (#13130)
- docs: improve docstring scheduling_flow_match_lcm.py (#13160)
- docs: improve docstring scheduling_ipndm.py (#13198)
@yiyixuxu
- [Modular]z-image (#12808)
- more update in modular (#12560)
- [Modular] qwen refactor (#12872)
- [Modular] better docstring (#12932)
- [Modular] mellon utils (#12978)
- Flux2 klein (#12982)
- [modular] fix a bug in mellon param & improve docstrings (#12980)
- [modular] add auto_docstring & more doc related refactors (#12958)
- [modular]support klein (#13002)
- [Modular]add a real quick start guide (#13029)
- [Modular] loader related (#13025)
- [Modular] mellon doc etc (#13051)
- [modular]simplify components manager doc (#13088)
- [Modular] refactor Wan: modular pipelines by task etc (#13063)
- [Modular] guard ModularPipeline.blocks attribute (#13014)
- [Modular] add different pipeine blocks to init (#13145)
- fix MT5Tokenizer (#13146)
- fix guider (#13147)
- [Modular] update doc for ModularPipeline (#13100)
- [Modular] add explicit workflow support (#13028)
- [Modular] update the auto pipeline blocks doc (#13148)
- [modular] fallback to default_blocks_name when loading base block classes in ModularPipeline (#13193)
- [modular]Update model card to include workflow (#13195)
- [modular] not pass trust_remote_code to external repos (#13204)
@sayakpaul
- Fix Qwen Edit Plus modular for multi-image input (#12601)
- [docs] improve distributed inference cp docs. (#12810)
- post release 0.36.0 (#12804)
- Update distributed_inference.md to correct syntax (#12827)
- [lora] Remove lora docs unneeded and add " # Copied from ..." (#12824)
- fix the use of device_map in CP docs (#12902)
- [core] remove unneeded autoencoder methods when subclassing from AutoencoderMixin (#12873)
- [docs] fix torchao typo. (#12883)
- Update wan.md to remove unneeded hfoptions (#12890)
- [modular] Tests for custom blocks in modular diffusers (#12557)
- [chore] remove controlnet implementations outside controlnet module. (#12152)
- [core] Handle progress bar and logging in distributed environments (#12806)
- [modular] error early in enable_auto_cpu_offload (#12578)
- fix how is_fsdp is determined (#12960)
- [LoRA] add LoRA support to LTX-2 (#12933)
- [docs] polish caching docs. (#12684)
- Z rz rz rz rz rz rz r cogview (#12973)
- Update distributed_inference.md to reposition sections (#12971)
- [chore] make transformers version check stricter for glm image. (#12974)
- add klein docs. (#12984)
- [core] gracefully error out when attn-backend x cp combo isn't supported. (#12832)
- make style && make quality
- Revert "make style && make quality"
- [chore] make style to push new changes. (#12998)
- fix Dockerfiles for cuda and xformers. (#13022)
- [QwenImage] fix prompt isolation tests (#13042)
- change to CUDA 12.9. (#13045)
- [wan] fix layerwise upcasting tests on CPU (#13039)
- [ci] uniform run times and wheels for pytorch cuda. (#13047)
- [wan] fix wan 2.2 when either of the transformers isn't present. (#13055)
- [modular] change the template modular pipeline card (#13072)
- [docs] Fix syntax error in quantization configuration (#13076)
- [core] make flux hidden states contiguous (#13068)
- [core] make qwen hidden states contiguous to make torchao happy. (#13081)
- [modular] add modular tests for Z-Image and Wan (#13078)
- [lora] fix non-diffusers lora key handling for flux2 (#13119)
- [modular] add tests for robust model loading. (#13120)
- fix cosmos transformer typing. (#13134)
- Sunset Python 3.8 & get rid of explicit typing exports where possible (#12524)
- feat: implement apply_lora_scale to remove boilerplate. (#12994)
- [docs] fix ltx2 i2v docstring. (#13135)
- [tests] accept recompile_limit from the user in tests (#13150)
- [core] support device type device_maps to work with offloading. (#12811)
- [core] Enable CP for kernels-based attention backends (#12812)
- remove deps related to test from ci (#13164)
- migrate to transformers v5 (#12976)
- [docs] Fix torchrun command argument order in docs (#13181)
- [attention backends] use dedicated wrappers from fa3 for cp. (#13165)
- [tests] consistency tests for modular index (#13192)
- [chore] updates in the pypi publication workflow. (#12805)
- [tests] enable cpu offload test in torchao without compilation. (#12704)
- remove db utils from benchmarking (#13199)
- [Modular] implement requirements validation for custom blocks (#12196)
- [lora] fix zimage lora conversion to support for more lora. (#13209)
- [attention backends] change to updated repo and version. (#13161)
Release: v0.37.0-release
@DN6
- [WIP] Add Flux2 modular (#12763)
- Refactor Model Tests (#12822)
- [Docs] Add guide for AutoModel with custom code (#13099)
- [CI] Refactor Wan Model Tests (#13082)
- [Pipelines] Remove k-diffusion (#13152)
- [CI] Add ftfy as a test dependency (#13155)
- [CI] Fix new LoRAHotswap tests (#13163)
- Allow Automodel to use from_config with custom code. (#13123)
- [AutoModel] Fix bug with subfolders and local model paths when loading custom code (#13197)
- [AutoModel] Allow registering auto_map to model config (#13186)
- [Modular] Save Modular Pipeline weights to Hub (#13168)
- Clean up accidental files (#13202)
@naykun
- [qwen-image] edit 2511 support (#12839)
- Qwen Image Layered Support (#12853)
@junqiangwu
- Add support for LongCat-Image (#12828)
- fix the prefix_token_len bug (#12845)
@hlky
- Z-Image-Turbo ControlNet (#12792)
- Z-Image-Turbo from_single_file fix (#12888)
- Detect 2.0 vs 2.1 ZImageControlNetModel (#12861)
- disable_mmap in pipeline from_pretrained (#12854)
- ZImageControlNet cfg (#13080)
@miguelmartin75
- Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion (#12852)
- Cosmos Predict2.5 14b Conversion (#12863)
- Fix typo in src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py (#12914)
- Cosmos Transfer2.5 inference pipeline: general/{seg, depth, blur, edge} (#13066)
- Cosmos Transfer2.5 Auto-Regressive Inference Pipeline (#13114)
@RuoyiDu
- Add z-image-omni-base implementation (#12857)
@r4inm4ker
- Community Pipeline: Add z-image differential img2img (#12882)
@yaoqih
- LTX Video 0.9.8 long multi prompt (#12614)
@dg845
- Add LTX 2.0 Video Pipelines (#12915)
- Add Flag to PeftLoraLoaderMixinTests to Enable/Disable Text Encoder LoRA Tests (#12962)
- LTX 2 Single File Support (#12983)
- LTX 2 Improve encode_video by Accepting More Input Types (#13057)
- Fix LTX-2 Inference when num_videos_per_prompt > 1 and CFG is Enabled (#13121)
- [CI] Fix setuptools pkg_resources Errors (#13129)
- [CI] Fix setuptools pkg_resources Bug for PR GPU Tests (#13132)
- [CI] Revert setuptools CI Fix as the Failing Pipelines are Deprecated (#13149)
- Fix ftfy import for PRX Pipeline (#13154)
- Fix AutoModel typing Import Error (#13178)
- Add Helios-14B Video Generation Pipelines (#13208)
- Add LTX2 Condition Pipeline (#13058)
@kashif
- [Research] Latent Perceptual Loss (LPL) for Stable Diffusion XL (#11573)
- Fix QwenImage txt_seq_lens handling (#12702)
- [Qwen] avoid creating attention masks when there is no padding (#12987)
@bhavya01
- Change timestep device to cpu for xla (#11501)
@linoytsaban
- [LoRA] add lora_alpha to sana README (#11780)
- Z image lora training (#13056)
@stevhliu
- [docs] Remote inference (#12372)
- [docs] add docs for qwenimagelayered (#13158)
@hameerabbasi
- Add ChromaInpaintPipeline (#12848)
- Remove pooled_ mentions from Chroma inpaint (#13026)
@galbria
- Fibo edit pipeline (#12930)
@JaredforReal
- [GLM-Image] Add batch support for GlmImagePipeline (#13007)
- [bug fix] GLM-Image fit new get_image_features API (#13052)
- [Fix]Allow prompt and prior_token_ids to be provided simultaneously in GlmImagePipeline (#13092)
@rootonchair
- LTX2 distilled checkpoint support (#12934)
@AlanPonnachan
- Add support for Magcache (#12744)
@CalamitousFelicitousness
- Feature/zimage inpaint pipeline (#13006)
@Ando233
- feat: implement rae autoencoder. (#13046)
- Dec 9, 2025
- Date parsed from source:Dec 9, 2025
- First seen by Releasebot:Mar 20, 2026
Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more 🎄
diffusers releases a packed update with new image and video pipelines, TaylorSeer cache support, kernels-powered attention backends, and a new Flux.2 LoRA training script. It also expands model coverage with Z-Image, Kandinsky 5, HunyuanVideo 1.5, Wan Animate, ChronoEdit, and more.
The release features a number of new image and video pipelines, a new caching method, a new training script, new kernels - powered attention backends, and more. It is quite packed with a lot of new stuff, so make sure you read the release notes fully 🚀
New image pipelines
Flux2: Flux2 is the latest generation of image generation and editing model from Black Forest Labs. It’s capable of taking multiple input images as reference, making it versatile for different use cases.
Z-Image: Z-Image is a best-of-its-kind image generation model in the 6B param regime. Thanks to @JerryWu-code in #12703.
QwenImage Edit Plus: It’s an upgrade of QwenImage Edit and is capable of taking multiple input images as references. It can act as both a generation and an editing model. Thanks to @naykun for contributing in #12357.
Bria FIBO: FIBO is trained on structured JSON captions up to 1,000+ words and designed to understand and control different visual parameters such as lighting, composition, color, and camera settings, enabling precise and reproducible outputs. Thanks to @galbria for contributing this in #12545.
Kandinsky Image Lite: Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters). Thanks to @leffff for contributing this in #12664.
ChronoEdit: ChronoEdit reframes image editing as a video generation task, using input and edited images as start/end frames to leverage pretrained video models with temporal consistency. A temporal reasoning stage introduces reasoning tokens to ensure physically plausible edits and visualize the editing trajectory. Thanks to @zhangjiewu for contributing this in #12593.
New video pipelines
Sana-Video: Sana-Video is a fast and efficient video generation model, equipped to handle long video sequences, thanks to its incorporation of linear attention. Thanks to @lawrence-cj for contributing this in #12634.
Kandinsky 5: Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger models and offers the best understanding of Russian concepts in the open-source ecosystem. Thanks to @leffff for contributing this in #12478.
Hunyuan 1.5: HunyuanVideo-1.5 is a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs.
Wan Animate: Wan-Animate is a state-of-the-art character animation and replacement video model based on Wan2.1. Given a reference character image and driving motion video, it can either animate the character with motion from the driving video, or replace the existing character in that video with that character.
New kernels-powered attention backends
The kernels library helps you save a lot of time by providing pre-built kernel interfaces for various environments and accelerators. This release features three new kernels-powered attention backends:
- Flash Attention 3 (+ its varlen variant)
- Flash Attention 2 (+ its varlen variant)
- SAGE
This means if any of the above backend is supported by your development environment, you should be able to skip the manual process of building the corresponding kernels and just use:
# Make sure you have `kernels` installed: `pip install kernels`. # You can choose `flash_hub` or `sage_hub`, too. pipe.transformer.set_attention_backend("_flash_3_hub")For more details, check out the documentation.
TaylorSeer cache
TaylorSeer is now supported in Diffusers, delivering upto 3x speedups with negligible-to-none quality compromise. Thanks to @toilaluan for contributing this in #12648. Check out the documentation here.
New training script
Our Flux.2 integration features a LoRA fine-tuning script that you can check out here. We provide a number of optimizations to help make it run on consumer GPUs.
Misc
Reusing AttentionMixin: Making certain compatible models subclass from the AttentionMixin class helped us get rid of 2K LoC. Going forward, users can expect more such refactorings that will help make the library leaner and simpler. Check out #12463 for more details.
Diffusers backend in SGLang: sgl-project/sglang#14112.
We started the Diffusers MVP program to work with talented community members who will help us improve the library across multiple fronts. Check out the link for more information.
All commits
- remove unneeded checkpoint imports. by @sayakpaul in #12488
- [tests] fix clapconfig for text backbone in audioldm2 by @sayakpaul in #12490
- ltx0.9.8 (without IC lora, autoregressive sampling) by @yiyixuxu in #12493
- [docs] Attention checks by @stevhliu in #12486
- [CI] Check links by @stevhliu in #12491
- [ci] xfail more incorrect transformer imports. by @sayakpaul in #12455
- [tests] introduce VAETesterMixin to consolidate tests for slicing and tiling by @sayakpaul in #12374
- docs: cleanup of runway model by @EazyAl in #12503
- Kandinsky 5 is finally in Diffusers! by @leffff in #12478
- Remove Qwen Image Redundant RoPE Cache by @dg845 in #12452
- Raise warning instead of error when imports are missing for custom code by @DN6 in #12513
- Fix: Use incorrect temporary variable key when replacing adapter name… by @FeiXie8 in #12502
- [docs] Organize toctree by modality by @stevhliu in #12514
- styling issues. by @sayakpaul in #12522
- Add Photon model and pipeline support by @DavidBert in #12456
- purge HF_HUB_ENABLE_HF_TRANSFER; promote Xet by @Vaibhavs10 in #12497
- Prx by @DavidBert in #12525
- [core] AutoencoderMixin to abstract common methods by @sayakpaul in #12473
- Kandinsky5 No cfg fix by @asomoza in #12527
- Fix: Add _skip_keys for AutoencoderKLWan by @yiyixuxu in #12523
- [CI] xfail the test_wuerstchen_prior test by @sayakpaul in #12530
- [tests] Test attention backends by @sayakpaul in #12388
- fix CI bug for kandinsky3_img2img case by @kaixuanliu in #12474
- Fix MPS compatibility in get_1d_sincos_pos_embed_from_grid #12432 by @Aishwarya0811 in #12449
- Handle deprecated transformer classes by @DN6 in #12517
- fix constants.py to user upper() by @sayakpaul in #12479
- HunyuanImage21 by @yiyixuxu in #12333
- Loose the criteria tolerance appropriately for Intel XPU devices by @kaixuanliu in #12460
- Deprecate Stable Cascade by @DN6 in #12537
- [chore] Move guiders experimental warning by @sayakpaul in #12543
- Fix Chroma attention padding order and update docs to use lodestones/Chroma1-HD by @josephrocca in #12508
- Add AITER attention backend by @lauri9 in #12549
- Fix small inconsistency in output dimension of "_get_t5_prompt_embeds" function in sd3 pipeline by @alirezafarashah in #12531
- Kandinsky 5 10 sec (NABLA suport) by @leffff in #12520
- Improve pos embed for Flux.1 inference on Ascend NPU by @gameofdimension in #12534
- support latest few-step wan LoRA. by @sayakpaul in #12541
- [Pipelines] Enable Wan VACE to run since single transformer by @DN6 in #12428
- fix crash if tiling mode is enabled by @sywangyi in #12521
- Fix typos in kandinsky5 docs by @Meatfucker in #12552
- [ci] don't run sana layerwise casting tests in CI. by @sayakpaul in #12551
- Bria fibo by @galbria in #12545
- Avoiding graph break by changing the way we infer dtype in vae.decoder by @ppadjinTT in #12512
- [Modular] Fix for custom block kwargs by @DN6 in #12561
- [Modular] Allow custom blocks to be saved to local_dir by @DN6 in #12381
- Fix Stable Diffusion 3.x pooled prompt embedding with multiple images by @friedrich in #12306
- Fix custom code loading in Automodel by @DN6 in #12571
- [modular] better warn message by @yiyixuxu in #12573
- [tests] add tests for flux modular (t2i, i2i, kontext) by @sayakpaul in #12566
- [modular]pass hub_kwargs to load_config by @yiyixuxu in #12577
- ulysses enabling in native attention path by @sywangyi in #12563
- Kandinsky 5.0 Docs fixes by @leffff in #12582
- [docs] sort doc by @sayakpaul in #12586
- [LoRA] add support for more Qwen LoRAs by @linoytsaban in #12581
- [Modular] Allow ModularPipeline to load from revisions by @DN6 in #12592
- Add optional precision-preserving preprocessing for examples/unconditional_image_generation/train_unconditional.py by @turian in #12596
- [SANA-Video] Adding 5s pre-trained 480p SANA-Video inference by @lawrence-cj in #12584
- Fix overflow and dtype handling in rgblike_to_depthmap (NumPy + PyTorch) by @MohammadSadeghSalehi in #12546
- [Modular] Some clean up for Modular tests by @DN6 in #12579
- feat: enable attention dispatch for huanyuan video by @DefTruth in #12591
- fix the crash in Wan-AI/Wan2.2-TI2V-5B-Diffusers if CP is enabled by @sywangyi in #12562
- [CI] Push test fix by @DN6 in #12617
- add ChronoEdit by @zhangjiewu in #12593
- [modular] wan! by @yiyixuxu in #12611
- [CI] Fix typo in uv install by @DN6 in #12618
- fix: correct import path for load_model_dict_into_meta in conversion scripts by @yashwantbezawada in #12616
- Fix Context Parallel validation checks by @DN6 in #12446
- [Modular] Clean up docs by @DN6 in #12604
- Fix: update type hints for Tuple parameters across multiple files to support variable-length tuples by @cesaryuan in #12544
- [CI] Remove unittest dependency from testing_utils.py by @DN6 in #12621
- Fix rotary positional embedding dimension mismatch in Wan and SkyReels V2 transformers by @charchit7 in #12594
- fix copies by @yiyixuxu in #12637
- Add MLU Support. by @a120092009 in #12629
- fix dispatch_attention_fn check by @yiyixuxu in #12636
- [modular] add tests for qwen modular by @sayakpaul in #12585
- ArXiv -> HF Papers by @qgallouedec in #12583
- [docs] Update install instructions by @stevhliu in #12626
- [modular] add a check by @yiyixuxu in #12628
- Improve docstrings and type hints in scheduling_amused.py by @delmalih in #12623
- [WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) by @dg845 in #12526
- adjust unit tests for test_save_load_float16 by @kaixuanliu in #12500
- skip autoencoderdl layerwise casting memory by @sayakpaul in #12647
- [utils] Update check_doc_toc by @stevhliu in #12642
- [docs] AutoModel by @stevhliu in #12644
- Improve docstrings and type hints in scheduling_ddim.py by @delmalih in #12622
- Improve docstrings and type hints in scheduling_ddpm.py by @delmalih in #12651
- [Modular] Add Custom Blocks guide to doc by @DN6 in #12339
- Improve docstrings and type hints in scheduling_euler_discrete.py by @delmalih in #12654
- Update Wan Animate Docs by @dg845 in #12658
- Rope in float32 for mps or npu compatibility by @DavidBert in #12665
- [PRX pipeline]: add 1024 resolution ratio bins by @DavidBert in #12670
- SANA-Video Image to Video pipeline SanaImageToVideoPipeline support by @lawrence-cj in #12634
- [CI] Make CI logs less verbose by @DN6 in #12674
- Revert AutoencoderKLWan's dim_mult default value back to list by @dg845 in #12640
- [CI] Temporarily pin transformers by @DN6 in #12677
- [core] Refactor hub attn kernels by @sayakpaul in #12475
- [CI] Fix indentation issue in workflow files by @DN6 in #12685
- [CI] Fix failing Pipeline CPU tests by @DN6 in #12681
- Improve docstrings and type hints in scheduling_pndm.py by @delmalih in #12676
- Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet by @pratim4dasude in #12649
- Improve docstrings and type hints in scheduling_lms_discrete.py by @delmalih in #12678
- Add FluxLoraLoaderMixin to Fibo pipeline by @SwayStar123 in #12688
- bugfix: fix chrono-edit context parallel by @DefTruth in #12660
- [core] support sage attention + FA2 through kernels by @sayakpaul in #12439
- [i8n-pt] Fix grammar and expand Portuguese documentation by @cdutr in #12598
- Fix variable naming typos in community FluxControlNetFillInpaintPipeline by @sqhuang in #12701
- fix typo in docs by @lawrence-cj in #12675
- Add Support for Z-Image Series by @JerryWu-code in #12703
- let's go Flux2 🚀 by @sayakpaul in #12711
- Update script names in README for Flux2 training by @anvilarth in #12713
- [lora]: Fix Flux2 LoRA NaN test by @sayakpaul in #12714
- [docs] Correct flux2 links by @sayakpaul in #12716
- [docs] put autopipeline after overview and hunyuanimage in images by @sayakpaul in #12548
- Improve docstrings and type hints in scheduling_dpmsolver_multistep.py by @delmalih in #12710
- Support unittest for Z-image ⚡️ by @JerryWu-code in #12715
- [chore] remove torch.save from remnant code. by @sayakpaul in #12717
- Enable regional compilation on z-image transformer model by @sayakpaul in #12736
- Fix examples not loading LoRA adapter weights from checkpoint by @SurAyush in #12690
- [Modular] Add single file support to Modular by @DN6 in #12383
- fix type-check for z-image transformer by @DefTruth in #12739
- Hunyuanvideo15 by @yiyixuxu in #12696
- [Docs] Update Imagen Video paper link in schedulers by @delmalih in #12724
- Improve docstrings and type hints in scheduling_heun_discrete.py by @delmalih in #12726
- Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py by @delmalih in #12766
- fix FLUX.2 context parallel by @DefTruth in #12737
- Rename BriaPipeline to BriaFiboPipeline in documentation by @galbria in #12758
- Update bria_fibo.md with minor fixes by @sayakpaul in #12731
- [feat]: implement "local" caption upsampling for Flux.2 by @sayakpaul in #12718
- Add ZImage LoRA support and integrate into ZImagePipeline by @CalamitousFelicitousness in #12750
- Add support for Ovis-Image by @DoctorKey in #12740
- Fix TPU (torch_xla) compatibility Error about tensor repeat func along with empty dim. by @JerryWu-code in #12770
- Fixes #12673. record_stream in group offloading is not working properly by @KimbingNg in #12721
- [core] start varlen variants for attn backend kernels. by @sayakpaul in #12765
- [core] reuse AttentionMixin for compatible classes by @sayakpaul in #12463
- Deprecate upcast_vae in SDXL based pipelines by @DN6 in #12619
- Kandinsky 5.0 Video Pro and Image Lite by @leffff in #12664
- Fix: leaf_level offloading breaks after delete_adapters by @adi776borate in #12639
- [tests] fix hunuyanvideo 1.5 offloading tests. by @sayakpaul in #12782
- [Z-Image] various small changes, Z-Image transformer tests, etc. by @sayakpaul in #12741
- Z-Image-Turbo from_single_file by @hlky in #12756
- Update attention_backends.md to format kernels by @sayakpaul in #12757
- Improve docstrings and type hints in scheduling_unipc_multistep.py by @delmalih in #12767
- fix spatial compression ratio error for AutoEncoderKLWan doing tiled encode by @jerry2102 in #12753
- [lora] support more ZImage LoRAs by @sayakpaul in #12790
- PRX Set downscale_freq_shift to 0 for consistency with internal implementation by @DavidBert in #12791
- Fix broken group offloading with block_level for models with standalone layers by @rycerzes in #12692
- [Docs] Add Z-Image docs by @asomoza in #12775
- move kandisnky docs. by @sayakpaul (direct commit on v0.36.0-release)
- [docs] minor fixes to kandinsky docs by @sayakpaul in #12797
- Improve docstrings and type hints in scheduling_deis_multistep.py by @delmalih in #12796
- [Feat] TaylorSeer Cache by @toilaluan in #12648
- Update the TensorRT-ModelOPT to Nvidia-ModelOPT by @jingyu-ml in #12793
- add post init for safty checker by @jiqing-feng in #12794
- [HunyuanVideo1.5] support step-distilled by @yiyixuxu in #12802
- Add ZImageImg2ImgPipeline by @CalamitousFelicitousness in #12751
- Release: v0.36.0-release by @sayakpaul (direct commit on v0.36.0-release)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
@yiyixuxu
- ltx0.9.8 (without IC lora, autoregressive sampling) (#12493)
- Fix: Add _skip_keys for AutoencoderKLWan (#12523)
- HunyuanImage21 (#12333)
- [modular] better warn message (#12573)
- [modular]pass hub_kwargs to load_config (#12577)
- [modular] wan! (#12611)
- fix copies (#12637)
- fix dispatch_attention_fn check (#12636)
- [modular] add a check (#12628)
- Hunyuanvideo15 (#12696)
- [HunyuanVideo1.5] support step-distilled (#12802)
@leffff
- Kandinsky 5 is finally in Diffusers! (#12478)
- Kandinsky 5 10 sec (NABLA suport) (#12520)
- Kandinsky 5.0 Docs fixes (#12582)
- Kandinsky 5.0 Video Pro and Image Lite (#12664)
@dg845
- Remove Qwen Image Redundant RoPE Cache (#12452)
- [WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) (#12526)
- Update Wan Animate Docs (#12658)
- Revert AutoencoderKLWan's dim_mult default value back to list (#12640)
@DN6
- Raise warning instead of error when imports are missing for custom code (#12513)
- Handle deprecated transformer classes (#12517)
- Deprecate Stable Cascade (#12537)
- [Pipelines] Enable Wan VACE to run since single transformer (#12428)
- [Modular] Fix for custom block kwargs (#12561)
- [Modular] Allow custom blocks to be saved to local_dir (#12381)
- Fix custom code loading in Automodel (#12571)
- [Modular] Allow ModularPipeline to load from revisions (#12592)
- [Modular] Some clean up for Modular tests (#12579)
- [CI] Push test fix (#12617)
- [CI] Fix typo in uv install (#12618)
- Fix Context Parallel validation checks (#12446)
- [Modular] Clean up docs (#12604)
- [CI] Remove unittest dependency from testing_utils.py (#12621)
- [Modular] Add Custom Blocks guide to doc (#12339)
- [CI] Make CI logs less verbose (#12674)
- [CI] Temporarily pin transformers (#12677)
- [CI] Fix indentation issue in workflow files (#12685)
- [CI] Fix failing Pipeline CPU tests (#12681)
- [Modular] Add single file support to Modular (#12383)
- Deprecate upcast_vae in SDXL based pipelines (#12619)
@DavidBert
- Add Photon model and pipeline support (#12456)
- Prx (#12525)
- Rope in float32 for mps or npu compatibility (#12665)
- [PRX pipeline]: add 1024 resolution ratio bins (#12670)
- PRX Set downscale_freq_shift to 0 for consistency with internal implementation (#12791)
@galbria
- Bria fibo (#12545)
- Rename BriaPipeline to BriaFiboPipeline in documentation (#12758)
@lawrence-cj
- [SANA-Video] Adding 5s pre-trained 480p SANA-Video inference (#12584)
- SANA-Video Image to Video pipeline SanaImageToVideoPipeline support (#12634)
- fix typo in docs (#12675)
@zhangjiewu
- add ChronoEdit (#12593)
@delmalih
- Improve docstrings and type hints in scheduling_amused.py (#12623)
- Improve docstrings and type hints in scheduling_ddim.py (#12622)
- Improve docstrings and type hints in scheduling_ddpm.py (#12651)
- Improve docstrings and type hints in scheduling_euler_discrete.py (#12654)
- Improve docstrings and type hints in scheduling_pndm.py (#12676)
- Improve docstrings and type hints in scheduling_lms_discrete.py (#12678)
- Improve docstrings and type hints in scheduling_dpmsolver_multistep.py (#12710)
- [Docs] Update Imagen Video paper link in schedulers (#12724)
- Improve docstrings and type hints in scheduling_heun_discrete.py (#12726)
- Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py (#12766)
- Improve docstrings and type hints in scheduling_unipc_multistep.py (#12767)
- Improve docstrings and type hints in scheduling_deis_multistep.py (#12796)
@pratim4dasude
- Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet (#12649)
@JerryWu-code
- Add Support for Z-Image Series (#12703)
- Support unittest for Z-image ⚡️ (#12715)
- Fix TPU (torch_xla) compatibility Error about tensor repeat func along with empty dim. (#12770)
@CalamitousFelicitousness
- Add ZImage LoRA support and integrate into ZImagePipeline (#12750)
- Add ZImageImg2ImgPipeline (#12751)
@DoctorKey
- Add support for Ovis-Image (#12740)
All of your release notes in one feed
Join Releasebot and get updates from Hugging Face and hundreds of other software products.
- Oct 15, 2025
- Date parsed from source:Oct 15, 2025
- First seen by Releasebot:Mar 20, 2026
🐞 fixes for `transformers` models, imports,
diffusers ships v0.35.2-patch with transformers offload fixes and PyTorch compatibility updates.
All commits
Release: v0.35.1-patch by @sayakpaul (direct commit on v0.35.2-patch)
- handle offload_state_dict when initing transformers models by @sayakpaul in #12438
- [CI] Fix TRANSFORMERS_FLAX_WEIGHTS_NAME import issue by @DN6 in #12354
- Fix PyTorch 2.3.1 compatibility: add version guard for torch.library.… by @Aishwarya0811 in #12206
- fix scale_shift_factor being on cpu for wan and ltx by @vladmandic in #12347
Release: v0.35.2-patch by @sayakpaul (direct commit on v0.35.2-patch)
Original source Report a problem - Aug 20, 2025
- Date parsed from source:Aug 20, 2025
- First seen by Releasebot:Mar 20, 2026
v0.35.1 for improvements in Qwen-Image Edit
diffusers improves Qwen-Image Edit with two contributor PRs.
Thanks to @naykun for the following PRs that improve Qwen-Image Edit:
- #12188
- #12190
- Aug 19, 2025
- Date parsed from source:Aug 19, 2025
- First seen by Releasebot:Mar 20, 2026
Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more
diffusers adds major new image, video, and editing pipelines, including Wan 2.2, Flux-Kontext, Qwen-Image, and Qwen-Image-Edit. It also brings new training scripts, faster pipeline loading, better GGUF support, and experimental Modular Diffusers for more flexible workflows.
This release comes packed with new image generation and editing pipelines, a new video pipeline, new training scripts, quality-of-life improvements, and much more. Read the rest of the release notes fully to not miss out on the fun stuff.
New pipelines 🧨
We welcomed new pipelines in this release:
- Wan 2.2
- Flux-Kontext
- Qwen-Image
- Qwen-Image-Edit
Wan 2.2 📹
This update to Wan provides significant improvements in video fidelity, prompt adherence, and style. Please check out the official doc to learn more.
Flux-Kontext 🎇
Flux-Kontext is a 12-billion-parameter rectified flow transformer capable of editing images based on text instructions. Please check out the official doc to learn more about it.
Qwen-Image 🌅
After a successful run of delivering language models and vision-language models, the Qwen team is back with an image generation model, which is Apache-2.0 licensed! It achieves significant advances in complex text rendering and precise image editing. To learn more about this powerful model, refer to our docs.
Thanks to @naykun for contributing both Qwen-Image and Qwen-Image-Edit via this PR and this PR.
New training scripts 🎛️
Make these newly added models your own with our training scripts:
- Kontext trainer
- Qwen-Image trainer
Single-file modeling implementations
Following the 🤗 Transformers’ philosophy of single-file modeling implementations, we have started implementing modeling code in single and self-contained files. The Flux Transformer code is one example of this.
Attention refactor
We have massively refactored how we do attention in the models. This allows us to provide support for different attention backends (such as PyTorch native scaled_dot_product_attention, Flash Attention 3, SAGE attention, etc.) in the library seamlessly.
Having attention supported this way also allows us to integrate different parallelization mechanisms, which we’re actively working on. Follow this PR if you’re interested.
Users shouldn’t be affected at all by these changes. Please open an issue if you face any problems.
Regional compilation
Regional compilation trims cold-start latency by only compiling the small and frequently-repeated block(s) of a model - typically a transformer layer - and enables reusing compiled artifacts for every subsequent occurrence. For many diffusion architectures, this delivers the same runtime speedups as full-graph compilation and reduces compile time by 8–10x. Refer to this doc to learn more.
Thanks to @anijain2305 for contributing this feature in this PR.
We have also authored a number of posts that center around the use of torch.compile. You can check them out at the links below:
- Presenting Flux Fast: Making Flux go brrr on H100s
- torch.compile and Diffusers: A Hands-On Guide to Peak Performance
- Fast LoRA inference for Flux with Diffusers and PEFT
Faster pipeline loading ⚡️
Users can now load pipelines directly on an accelerator device leading to significantly faster load times. This particularly becomes evident when loading large pipelines like Wan and Qwen-Image.
from diffusers import DiffusionPipeline import torch ckpt_id = "Qwen/Qwen-Image" pipe = DiffusionPipeline.from_pretrained( ckpt_id, torch_dtype=torch.bfloat16, device_map="cuda" )You can speed up loading even more by enabling parallelized loading of state dict shards. This is particularly helpful when you’re working with large models like Wan and Qwen-Image, where the model state dicts are typically sharded across multiple files.
import os os.environ["HF_ENABLE_PARALLEL_LOADING"] = "yes" # rest of the loading code ....Better GGUF integration
@Isotr0py contributed support for native GGUF CUDA kernels in this PR. This should provide an approximately 10% improvement in inference speed.
We have also worked on a tool for converting regular checkpoints to GGUF, letting the community easily share their GGUF checkpoints. Learn more here.
We now support loading of Diffusers format GGUF checkpoints.
You can learn more about all of this in our GGUF official docs.
Modular Diffusers (Experimental)
Modular Diffusers is a system for building diffusion pipelines pipelines with individual pipeline blocks. It is highly customisable, with blocks that can be mixed and matched to adapt to or create a pipeline for a specific workflow or multiple workflows.
The API is currently in active development and is being released as an experimental feature. Learn more in our docs.
All commits
- [tests] skip instead of returning. by @sayakpaul in #11793
- adjust to get CI test cases passed on XPU by @kaixuanliu in #11759
- fix deprecation in lora after 0.34.0 release by @sayakpaul in #11802
- [chore] post release v0.34.0 by @sayakpaul in #11800
- Follow up for Group Offload to Disk by @DN6 in #11760
- [rfc][compile] compile method for DiffusionPipeline by @anijain2305 in #11705
- [tests] add a test on torch compile for varied resolutions by @sayakpaul in #11776
- adjust tolerance criteria for test_float16_inference in unit test by @kaixuanliu in #11809
- Flux Kontext by @a-r-r-o-w in #11812
- Kontext training by @sayakpaul in #11813
- Kontext fixes by @a-r-r-o-w in #11815
- remove syncs before denoising in Kontext by @sayakpaul in #11818
- [CI] disable onnx, mps, flax from the CI by @sayakpaul in #11803
- TorchAO compile + offloading tests by @a-r-r-o-w in #11697
- Support dynamically loading/unloading loras with group offloading by @a-r-r-o-w in #11804
- [lora] fix: lora unloading behvaiour by @sayakpaul in #11822
- [lora]feat: use exclude modules to loraconfig. by @sayakpaul in #11806
- ENH: Improve speed of function expanding LoRA scales by @BenjaminBossan in #11834
- Remove print statement in SCM Scheduler by @a-r-r-o-w in #11836
- [tests] add test for hotswapping + compilation on resolution changes by @sayakpaul in #11825
- reset deterministic in tearDownClass by @jiqing-feng in #11785
- [tests] Fix failing float16 cuda tests by @a-r-r-o-w in #11835
- [single file] Cosmos by @a-r-r-o-w in #11801
- [docs] fix single_file example. by @sayakpaul in #11847
- Use real-valued instead of complex tensors in Wan2.1 RoPE by @mjkvaak-amd in #11649
- [docs] Batch generation by @stevhliu in #11841
- [docs] Deprecated pipelines by @stevhliu in #11838
- fix norm not training in train_control_lora_flux.py by @Luo-Yihang in #11832
- [From Single File] support from_single_file method for WanVACE3DTransformer by @J4BEZ in #11807
- [lora] tests for exclude_modules with Wan VACE by @sayakpaul in #11843
- update: FluxKontextInpaintPipeline support by @vuongminh1907 in #11820
- [Flux Kontext] Support Fal Kontext LoRA by @linoytsaban in #11823
- [docs] Add a note of _keep_in_fp32_modules by @a-r-r-o-w in #11851
- [benchmarks] overhaul benchmarks by @sayakpaul in #11565
- FIX set_lora_device when target layers differ by @BenjaminBossan in #11844
- Fix Wan AccVideo/CausVid fuse_lora by @a-r-r-o-w in #11856
- [chore] deprecate blip controlnet pipeline. by @sayakpaul in #11877
- [docs] fix references in flux pipelines. by @sayakpaul in #11857
- [tests] remove tests for deprecated pipelines. by @sayakpaul in #11879
- [docs] LoRA metadata by @stevhliu in #11848
- [training ] add Kontext i2i training by @sayakpaul in #11858
- [CI] Fix big GPU test marker by @DN6 in #11786
- First Block Cache by @a-r-r-o-w in #11180
- [tests] annotate compilation test classes with bnb by @sayakpaul in #11715
- Update chroma.md by @shm4r7 in #11891
- [CI] Speed up GPU PR Tests by @DN6 in #11887
- Pin k-diffusion for CI by @sayakpaul in #11894
- [Docker] update doc builder dockerfile to include quant libs. by @sayakpaul in #11728
- [tests] Remove more deprecated tests by @sayakpaul in #11895
- [tests] mark the wanvace lora tester flaky by @sayakpaul in #11883
- [tests] add compile + offload tests for GGUF. by @sayakpaul in #11740
- feat: add multiple input image support in Flux Kontext by @Net-Mist in #11880
- Fix unique memory address when doing group-offloading with disk by @sayakpaul in #11767
- [SD3] CFG Cutoff fix and official callback by @asomoza in #11890
- The Modular Diffusers by @yiyixuxu in #9672
- [quant] QoL improvements for pipeline-level quant config by @sayakpaul in #11876
- Bump torch from 2.4.1 to 2.7.0 in /examples/server by @dependabot[bot] in #11429
- [LoRA] fix: disabling hooks when loading loras. by @sayakpaul in #11896
- [utils] account for MPS when available in get_device(). by @sayakpaul in #11905
- [ControlnetUnion] Multiple Fixes by @asomoza in #11888
- Avoid creating tensor in CosmosAttnProcessor2_0 by @chenxiao111222 in #11761)
- [tests] Unify compilation + offloading tests in quantization by @sayakpaul in #11910
- Speedup model loading by 4-5x ⚡ by @a-r-r-o-w in #11904
- [docs] torch.compile blog post by @stevhliu in #11837
- Flux: pass joint_attention_kwargs when using gradient_checkpointing by @piercus in #11814
- Fix: Align VAE processing in ControlNet SD3 training with inference by @Henry-Bi in #11909
- Bump aiohttp from 3.10.10 to 3.12.14 in /examples/server by @dependabot[bot] in #11924
- [tests] Improve Flux tests by @a-r-r-o-w in #11919
- Remove device synchronization when loading weights by @a-r-r-o-w in #11927
- Remove forced float64 from onnx stable diffusion pipelines by @lostdisc in #11054
- Fixed bug: Uncontrolled recursive calls that caused an infinite loop when loading certain pipelines containing Transformer2DModel by @lengmo1996 in #11923
- [ControlnetUnion] Propagate #11888 to img2img by @asomoza in #11929
- enable flux pipeline compatible with unipc and dpm-solver by @gameofdimension in #11908
- [training] add an offload utility that can be used as a context manager. by @sayakpaul in #11775
- Add SkyReels V2: Infinite-Length Film Generative Model by @tolgacangoz in #11518
- [refactor] Flux/Chroma single file implementation + Attention Dispatcher by @a-r-r-o-w in #11916
- [docs] clarify the mapping between Transformer2DModel and finegrained variants. by @sayakpaul in #11947
- [Modular] Updates for Custom Pipeline Blocks by @DN6 in #11940
- [docs] Update toctree by @stevhliu in #11936
- [docs] include bp link. by @sayakpaul in #11952
- Fix kontext finetune issue when batch size >1 by @mymusise in #11921
- [tests] Add test slices for Hunyuan Video by @a-r-r-o-w in #11954
- [tests] Add test slices for Cosmos by @a-r-r-o-w in #11955
- [tests] Add fast test slices for HiDream-Image by @a-r-r-o-w in #11953
- [Modular] update the collection behavior by @yiyixuxu in #11963
- fix "Expected all tensors to be on the same device, but found at least two devices" error by @yao-matrix in #11690
- Remove logger warnings for attention backends and hard error during runtime instead by @a-r-r-o-w in #11967
- [Examples] Uniform notations in train_flux_lora by @tomguluson92 in #10011
- fix style by @yiyixuxu in #11975
- [tests] Add test slices for Wan by @a-r-r-o-w in #11920
- [docs] update guidance_scale docstring for guidance_distilled models. by @sayakpaul in #11935
- [tests] enforce torch version in the compilation tests. by @sayakpaul in #11979
- [modular diffusers] Wan by @a-r-r-o-w in #11913
- [compile] logger statements create unnecessary guards during dynamo tracing by @a-r-r-o-w in #11987
- enable quantcompile test on xpu by @yao-matrix in #11988
- [WIP] Wan2.2 by @yiyixuxu in #12004
- [refactor] some shared parts between hooks + docs by @a-r-r-o-w in #11968
- [refactor] Wan single file implementation by @a-r-r-o-w in #11918
- Fix huggingface-hub failing tests by @asomoza in #11994
- feat: add flux kontext by @jlonge4 in #11985
- [modular] add Modular flux for text-to-image by @sayakpaul in #11995
- [docs] include lora fast post. by @sayakpaul in #11993
- [docs] quant_kwargs by @stevhliu in #11712
- [docs] Fix link by @stevhliu in #12018
- [wan2.2] add 5b i2v by @yiyixuxu in #12006
- wan2.2 i2v FirstBlockCache fix by @okaris in #12013
- [core] support attention backends for LTX by @sayakpaul in #12021
- [docs] Update index by @stevhliu in #12020
- [Fix] huggingface-cli to hf missed files by @asomoza in #12008
- [training-scripts] Make pytorch examples UV-compatible by @sayakpaul in #12000
- [wan2.2] fix vae patches by @yiyixuxu in #12041
- Allow SD pipeline to use newer schedulers, eg: FlowMatch by @ppbrown in #12015
- [LoRA] support lightx2v lora in wan by @sayakpaul in #12040
- Fix type of force_upcast to bool by @BerndDoser in #12046
- Update autoencoder_kl_cosmos.py by @tanuj-rai in #12045
- Qwen-Image by @naykun in #12055
- [wan2.2] follow-up by @yiyixuxu in #12024
- tests + minor refactor for QwenImage by @a-r-r-o-w in #12057
- Cross attention module to Wan Attention by @samuelt0 in #12058
- fix(qwen-image): update vae license by @naykun in #12063
- CI fixing by @paulinebm in #12059
- enable all gpus when running ci. by @sayakpaul in #12062
- fix the rest for all GPUs in CI by @sayakpaul in #12064
- [docs] Install by @stevhliu in #12026
- [wip] feat: support lora in qwen image and training script by @sayakpaul in #12056
- [docs] small corrections to the example in the Qwen docs by @sayakpaul in #12068
- [tests] Fix Qwen test_inference slices by @a-r-r-o-w in #12070
- [tests] deal with the failing AudioLDM2 tests by @sayakpaul in #12069
- optimize QwenImagePipeline to reduce unnecessary CUDA synchronization by @chengzeyi in #12072
- Add cuda kernel support for GGUF inference by @Isotr0py in #11869
- fix input shape for WanGGUFTexttoVideoSingleFileTests by @jiqing-feng in #12081
- [refactor] condense group offloading by @a-r-r-o-w in #11990
- Fix group offloading synchronization bug for parameter-only GroupModule's by @a-r-r-o-w in #12077
- Helper functions to return skip-layer compatible layers by @a-r-r-o-w in #12048
- Make prompt_2 optional in Flux Pipelines by @DN6 in #12073
- [tests] tighten compilation tests for quantization by @sayakpaul in #12002
- Implement Frequency-Decoupled Guidance (FDG) as a Guider by @dg845 in #11976
- fix flux type hint by @DefTruth in #12089
- [qwen] device typo by @yiyixuxu in #12099
- [lora] adapt new LoRA config injection method by @sayakpaul in #11999
- lora_conversion_utils: replace lora up/down with a/b even if transformer. in key by @Beinsezii in #12101
- [tests] device placement for non-denoiser components in group offloading LoRA tests by @sayakpaul in #12103
- [Modular] Fast Tests by @yiyixuxu in #11937
- [GGUF] feat: support loading diffusers format gguf checkpoints. by @sayakpaul in #11684
- [docs] diffusers gguf checkpoints by @sayakpaul in #12092
- [core] add modular support for Flux I2I by @sayakpaul in #12086
- [lora] support loading loras from lightx2v/Qwen-Image-Lightning by @sayakpaul in #12119
- [Modular] More Updates for Custom Code Loading by @DN6 in #11969
- enable compilation in qwen image. by @sayakpaul in #12061
- [tests] Add inference test slices for SD3 and remove unnecessary tests by @a-r-r-o-w in #12106
- [chore] complete the licensing statement. by @sayakpaul in #12001
- [docs] Cache link by @stevhliu in #12105
- [Modular] Add experimental feature warning for Modular Diffusers by @DN6 in #12127
- Add low_cpu_mem_usage option to from_single_file to align with from_pretrained by @IrisRainbowNeko in #12114
- [docs] Modular diffusers by @stevhliu in #11931
- [Bugfix] typo fix in NPU FA by @leisuzz in #12129
- Add QwenImage Inpainting and Img2Img pipeline by @Trgtuan10 in #12117
- [core] parallel loading of shards by @sayakpaul in #12028
- try to use deepseek with an agent to auto i18n to zh by @SamYuan1990 in #12032
- [docs] Refresh effective and efficient doc by @stevhliu in #12134
- Fix bf15/fp16 for pipeline_wan_vace.py by @SlimRG in #12143
- make parallel loading flag a part of constants. by @sayakpaul in #12137
- [docs] Parallel loading of shards by @stevhliu in #12135
- feat: cuda device_map for pipelines. by @sayakpaul in #12122
- [core] respect local_files_only=True when using sharded checkpoints by @sayakpaul in #12005
- support hf_quantizer in cache warmup. by @sayakpaul in #12043
- make test_gguf all pass on xpu by @yao-matrix in #12158
- [docs] Quickstart by @stevhliu in #12128
- Qwen Image Edit Support by @naykun in #12164
- remove silu for CogView4 by @lambertwjh in #12150
- [qwen] Qwen image edit followups by @sayakpaul in #12166
- Minor modification to support DC-AE-turbo by @chenjy2003 in #12169
- [Docs] typo error in qwen image by @leisuzz in #12144
- fix: caching allocator behaviour for quantization. by @sayakpaul in #12172
- fix(training_utils): wrap device in list for DiffusionPipeline by @MengAiDev in #12178
- [docs] Clarify guidance scale in Qwen pipelines by @sayakpaul in #12181
- [LoRA] feat: support more Qwen LoRAs from the community. by @sayakpaul in #12170
- Update README.md by @Taechai in #12182
- [chore] add lora button to qwenimage docs by @sayakpaul in #12183
- [Wan 2.2 LoRA] add support for 2nd transformer lora loading + wan 2.2 lightx2v lora by @linoytsaban in #12074
- Release: v0.35.0 by @sayakpaul (direct commit on v0.35.0-release)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
@vuongminh1907
- update: FluxKontextInpaintPipeline support (#11820)
@Net-Mist
- feat: add multiple input image support in Flux Kontext (#11880)
@tolgacangoz
- Add SkyReels V2: Infinite-Length Film Generative Model (#11518)
@naykun
- Qwen-Image (#12055)
- fix(qwen-image): update vae license (#12063)
- Qwen Image Edit Support (#12164)
@Trgtuan10
- Add QwenImage Inpainting and Img2Img pipeline (#12117)
@SamYuan1990
- try to use deepseek with an agent to auto i18n to zh (#12032)
- Jul 1, 2025
- Date parsed from source:Jul 1, 2025
- First seen by Releasebot:Mar 20, 2026
Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more
diffusers adds major new video and image generation pipelines, including Wan VACE, Cosmos Predict2, LTX 0.9.7, Hunyuan Video Framepack, Chroma, and VisualCloze. It also improves torch.compile support, adds pipeline quantization and disk offloading, and expands LoRA and training support.
📹 New video generation pipelines
Wan VACE
Wan VACE supports various generation techniques which achieve controllable video generation. It comes in two variants: a 1.3B model for fast iteration & prototyping, and a 14B for high quality generation. Some of the capabilities include:
- Control to Video (Depth, Pose, Sketch, Flow, Grayscale, Scribble, Layout, Boundary Box, etc.). Recommended library for preprocessing videos to obtain control videos: huggingface/controlnet_aux
- Image/Video to Video (first frame, last frame, starting clip, ending clip, random clips)
- Inpainting and Outpainting
- Subject to Video (faces, object, characters, etc.)
- Composition to Video (reference anything, animate anything, swap anything, expand anything, move anything, etc.)
The code snippets available in this pull request demonstrate some examples of how videos can be generated with controllability signals.
Check out the docs to learn more.
Cosmos Predict2 Video2World
Cosmos-Predict2 is a key branch of the Cosmos World Foundation Models (WFMs) ecosystem for Physical AI, specializing in future state prediction through advanced world modeling. It offers two powerful capabilities: text-to-image generation for creating high-quality images from text descriptions, and video-to-world generation for producing visual simulations from video inputs.
The Video2World model comes in a 2B and 14B variant. Check out the docs to learn more.
LTX 0.9.7 and Distilled
LTX 0.9.7 and its distilled variants are the latest in the family of models released by Lightricks.
Check out the docs to learn more.
Hunyuan Video Framepack and F1
Framepack is a novel method for enabling long video generation. There are two released variants of Hunyuan Video trained using this technique. Check out the docs to learn more.
FusionX
The FusionX family of models and LoRAs, built on top of Wan2.1-14B, should already be supported. To load the model, use from_single_file():
from diffusers import WanTransformer3DModel transformer = WanTransformer3DModel.from_single_file( "https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/blob/main/Wan14Bi2vFusioniX_fp16.safetensors", torch_dtype=torch.bfloat16 )To load the LoRAs, use load_lora_weights():
pipe = DiffusionPipeline.from_pretrained( "Wan-AI/Wan2.1-T2V-14B-Diffusers", torch_dtype=torch.bfloat16 ).to("cuda") pipe.load_lora_weights( "vrgamedevgirl84/Wan14BT2VFusioniX", weight_name="FusionX_LoRa/Wan2.1_T2V_14B_FusionX_LoRA.safetensors" )AccVideo and CausVid (only LoRAs)
AccVideo and CausVid are two novel distillation techniques that speed up the generation time of video diffusion models while preserving quality. Diffusers supports loading their extracted LoRAs with their respective models.
🌠 New image generation pipelines
Cosmos Predict2 Text2Image
Text-to-image models from the Cosmos-Predict2 release. The models comes in a 2B and 14B variant. Check out the docs to learn more.
Chroma
Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it. Checkout the docs to learn more
Thanks to @Ednaordinary for contributing it in this PR!
VisualCloze
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning is an innovative in-context learning framework based universal image generation framework that offers key capabilities:
- Support for various in-domain tasks
- Generalization to unseen tasks through in-context learning
- Unify multiple tasks into one step and generate both target image and intermediate results
- Support reverse-engineering conditions from target images
Check out the docs to learn more. Thanks to @lzyhha for contributing this in this PR!
Better torch.compile support
We have worked with the PyTorch team to improve how we provide torch.compile() compatibility throughout the library. More specifically, we now test the widely used models like Flux for any recompilation and graph break issues which can get in the way of fully realizing torch.compile() benefits. Refer to the following links to learn more:
- #11085
- #11430
Additionally, users can combine offloading with compilation to get a better speed-memory trade-off. Below is an example:
import torch from diffusers import DiffusionPipeline torch._dynamo.config.cache_size_limit = 10000 pipeline = DiffusionPipeline.from_pretrained( "black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16 ) pipline.enable_model_cpu_offload() # Compile. pipeline.transformer.compile() image = pipeline( prompt="An astronaut riding a horse on Mars", guidance_scale=0., height=768, width=1360, num_inference_steps=4, max_sequence_length=256, ).images[0] print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")This is compatible with group offloading, too. Interested readers can check out the concerned PRs below:
- #11605
- #11670
You can substantially reduce memory requirements by combining quantization with offloading and then improving speed with torch.compile(). Below is an example:
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig from diffusers import AutoModel, FluxPipeline from transformers import T5EncoderModel import torch torch._dynamo.config.recompile_limit = 1000 quant_kwargs = {"load_in_4bit": True, "bnb_4bit_compute_dtype": torch_dtype, "bnb_4bit_quant_type": "nf4"} text_encoder_2_quant_config = TransformersBitsAndBytesConfig(**quant_kwargs) dit_quant_config = DiffusersBitsAndBytesConfig(**quant_kwargs) ckpt_id = "black-forest-labs/FLUX.1-dev" text_encoder_2 = T5EncoderModel.from_pretrained( ckpt_id, subfolder="text_encoder_2", quantization_config=text_encoder_2_quant_config, torch_dtype=torch_dtype, ) transformer = AutoModel.from_pretrained( ckpt_id, subfolder="transformer", quantization_config=dit_quant_config, torch_dtype=torch_dtype, ) pipe = FluxPipeline.from_pretrained( ckpt_id, transformer=transformer, text_encoder_2=text_encoder_2, torch_dtype=torch_dtype, ) pipe.enable_model_cpu_offload() pipe.transformer.compile() image = pipeline( prompt="An astronaut riding a horse on Mars", guidance_scale=3.5, height=768, width=1360, num_inference_steps=28, max_sequence_length=512, ).images[0]Starting from bitsandbytes==0.46.0 onwards, bnb-quantized models should be fully compatible with torch.compile() without graph-breaks. This means that when compiling a bnb-quantized model, users can do: model.compile(fullgraph=True). This can significantly improve speed while still providing memory benefits. The figure below provides a comparison with Flux.1-Dev. Refer to this benchmarking script to learn more.
Note that for 4bit bnb models, it’s currently needed to install PyTorch nightly if fullgraph=True is specified during compilation.
Huge shoutout to @anijain2305 and @StrongerXi from the PyTorch team for the incredible support.
PipelineQuantizationConfig
Users can now provide a quantization config while initializing a pipeline:
import torch from diffusers import DiffusionPipeline from diffusers.quantizers import PipelineQuantizationConfig pipeline_quant_config = PipelineQuantizationConfig( quant_backend="bitsandbytes_4bit", quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16}, components_to_quantize=["transformer", "text_encoder_2"], ) pipe = DiffusionPipeline.from_pretrained( "black-forest-labs/FLUX.1-dev", quantization_config=pipeline_quant_config, torch_dtype=torch.bfloat16, ).to("cuda") image = pipe("photo of a cute dog").images[0]This reduces the barrier to entry for our users willing to use quantization without having to write too much code. Refer to the documentation to learn more about different configurations allowed through PipelineQuantizationConfig.
Group offloading with disk
In the previous release, we shipped “group offloading” which lets you offload blocks/nodes within a model, optimizing its memory consumption. It also lets you overlap this offloading with computation, providing a good speed-memory trade-off, especially in low VRAM environments.
However, you still need a considerable amount of system RAM to make offloading work effectively. So, low VRAM and low RAM environments would still not work.
Starting this release, users will additionally have the option to offload to disk instead of RAM, further lowering memory consumption. Set the offload_to_disk_path to enable this feature.
pipeline.transformer.enable_group_offload( onload_device="cuda", offload_device="cpu", offload_type="leaf_level", offload_to_disk_path="path/to/disk" )Refer to these two tables to compare the speed and memory trade-offs.
LoRA metadata parsing
It is beneficial to include the LoraConfig in a LoRA state dict that was used to train the LoRA. In its absence, users were restricted to using the same LoRA alpha as the LoRA rank. We have modified the most popular training scripts to allow passing custom lora_alpha through the CLI. Refer to this thread for more updates. Refer to this comment for some extended clarifications.
New training scripts
We now have a capable training script for training robust timestep-distilled models through the SANA Sprint framework. Check out this resource for more details. Thanks to @scxue and @lawrence-cj for contributing it in this PR.
HiDream LoRA DreamBooth training script (docs). The script supports training with quantization. HiDream is an MIT-licensed model. So, make it yours with this training script.
Updates on educational materials on quantization
We have worked on a two-part series discussing the support of quantization in Diffusers. Check them out:
- Exploring Quantization Backends in Diffusers
- (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware
All commits
- [LoRA] support musubi wan loras. by @sayakpaul in #11243
- fix test_vanilla_funetuning failure on XPU and A100 by @yao-matrix in #11263
- make test_stable_diffusion_inpaint_fp16 pass on XPU by @yao-matrix in #11264
- make test_dict_tuple_outputs_equivalent pass on XPU by @yao-matrix in #11265
- add onnxruntime-qnn & onnxruntime-cann by @xieofxie in #11269
- make test_instant_style_multiple_masks pass on XPU by @yao-matrix in #11266
- [BUG] Fix convert_vae_pt_to_diffusers bug by @lavinal712 in #11078
- Fix LTX 0.9.5 single file by @hlky in #11271
- [Tests] Cleanup lora tests utils by @sayakpaul in #11276
- [CI] relax tolerance for unclip further by @sayakpaul in #11268
- do not use DIFFUSERS_REQUEST_TIMEOUT for notification bot by @sayakpaul in #11273
- Fix incorrect tile_latent_min_width calculation in AutoencoderKLMochi by @kuantuna in #11294
- HiDream Image by @hlky in #11231
- flow matching lcm scheduler by @quickjkee in #11170
- Update autoencoderkl_allegro.md by @Forbu in #11303
- Hidream refactoring follow ups by @a-r-r-o-w in #11299
- Fix incorrect tile_latent_min_width calculations by @kuantuna in #11305
- [ControlNet] Adds controlnet for SanaTransformer by @ishan-modi in #11040
- make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU by @yao-matrix in #11308
- make test_stable_diffusion_karras_sigmas pass on XPU by @yao-matrix in #11310
- make KolorsPipelineFastTests::test_inference_batch_single_identical pass on XPU by @faaany in #11313
- [LoRA] support more SDXL loras. by @sayakpaul in #11292
- [HiDream] code example by @linoytsaban in #11317
- import for FlowMatchLCMScheduler by @asomoza in #11318
- Use float32 on mps or npu in transformer_hidream_image's rope by @hlky in #11316
- Add skrample section to community_projects.md by @Beinsezii in #11319
- [docs] Promote AutoModel usage by @sayakpaul in #11300
- [LoRA] Add LoRA support to AuraFlow by @hameerabbasi in #10216
- Fix vae.Decoder prev_output_channel by @hlky in #11280
- fix CPU offloading related fail cases on XPU by @yao-matrix in #11288
- [docs] fix hidream docstrings. by @sayakpaul in #11325
- Rewrite AuraFlowPatchEmbed.pe_selection_index_based_on_dim to be torch.compile compatible by @AstraliteHeart in #11297
- post release 0.33.0 by @sayakpaul in #11255
- another fix for FlowMatchLCMScheduler forgotten import by @asomoza in #11330
- Fix Hunyuan I2V for transformers>4.47.1 by @DN6 in #11293
- unpin torch versions for onnx Dockerfile by @sayakpaul in #11290
- [single file] enable telemetry for single file loading when using GGUF. by @sayakpaul in #11284
- [docs] add a snippet for compilation in the auraflow docs. by @sayakpaul in #11327
- Hunyuan I2V fast tests fix by @DN6 in #11341
- [BUG] fixed _toctree.yml alphabetical ordering by @ishan-modi in #11277
- Fix wrong dtype argument name as torch_dtype by @nPeppon in #11346
- [chore] fix lora docs utils by @sayakpaul in #11338
- [docs] add note about use_duck_shape in auraflow docs. by @sayakpaul in #11348
- [LoRA] Propagate hotswap better by @sayakpaul in #11333
- [Hi Dream] follow-up by @yiyixuxu in #11296
- [bitsandbytes] improve dtype mismatch handling for bnb + lora. by @sayakpaul in #11270
- Update controlnet_flux.py by @haofanwang in #11350
- enable 2 test cases on XPU by @yao-matrix in #11332
- [BNB] Fix test_moving_to_cpu_throws_warning by @SunMarc in #11356
- support Wan-FLF2V by @yiyixuxu in #11353
- Fix: StableDiffusionXLControlNetAdapterInpaintPipeline incorrectly inherited StableDiffusionLoraLoaderMixin by @Kazuki-Yoda in #11357
- update output for Hidream transformer by @yiyixuxu in #11366
- [Wan2.1-FLF2V] update conversion script by @yiyixuxu in #11365
- [Flux LoRAs] fix lr scheduler bug in distributed scenarios by @linoytsaban in #11242
- [train_dreambooth_lora_sdxl.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @kghamilton89 in #11240
- fix issue that training flux controlnet was unstable and validation r… by @PromeAIpro in #11373
- Fix Wan I2V prepare_latents dtype by @a-r-r-o-w in #11371
- [BUG] fixes in kadinsky pipeline by @ishan-modi in #11080
- Add Serialized Type Name kwarg in Model Output by @anzr299 in #10502
- [cogview4][feat] Support attention mechanism with variable-length support and batch packing by @OleehyO in #11349
- Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma by @josephrocca in #11120
- [Refactor] Minor Improvement for import utils by @ishan-modi in #11161
- Add stochastic sampling to FlowMatchEulerDiscreteScheduler by @apolinario in #11369
- [LoRA] add LoRA support to HiDream and fine-tuning script by @linoytsaban in #11281
- Update modeling imports by @a-r-r-o-w in #11129
- [HiDream] move deprecation to 0.35.0 by @yiyixuxu in #11384
- Update README_hidream.md by @AMEERAZAM08 in #11386
- Fix group offloading with block_level and use_stream=True by @a-r-r-o-w in #11375
- [train_dreambooth_flux] Add LANCZOS as the default interpolation mode for image resizing by @ishandutta0098 in #11395
- [Feature] Added Xlab Controlnet support by @ishan-modi in #11249
- Kolors additional pipelines, community contrib by @Teriks in #11372
- [HiDream LoRA] optimizations + small updates by @linoytsaban in #11381
- Fix Flux IP adapter argument in the pipeline example by @AeroDEmi in #11402
- [BUG] fixed WAN docstring by @ishan-modi in #11226
- Fix typos in strings and comments by @co63oc in #11407
- [train_dreambooth_lora.py] Set LANCZOS as default interpolation mode for resizing by @merterbak in #11421
- [tests] add tests to check for graph breaks, recompilation, cuda syncs in pipelines during torch.compile() by @sayakpaul in #11085
- enable group_offload cases and quanto cases on XPU by @yao-matrix in #11405
- enable test_layerwise_casting_memory cases on XPU by @yao-matrix in #11406
- [tests] fix import. by @sayakpaul in #11434
- [train_text_to_image] Better image interpolation in training scripts follow up by @tongyu0924 in #11426
- [train_text_to_image_lora] Better image interpolation in training scripts follow up by @tongyu0924 in #11427
- enable 28 GGUF test cases on XPU by @yao-matrix in #11404
- [Hi-Dream LoRA] fix bug in validation by @linoytsaban in #11439
- Fixing missing provider options argument by @urpetkov-amd in #11397
- Set LANCZOS as the default interpolation for image resizing in ControlNet training by @YoulunPeng in #11449
- Raise warning instead of error for block offloading with streams by @a-r-r-o-w in #11425
- enable marigold_intrinsics cases on XPU by @yao-matrix in #11445
- torch.compile fullgraph compatibility for Hunyuan Video by @a-r-r-o-w in #11457
- enable consistency test cases on XPU, all passed by @yao-matrix in #11446
- enable unidiffuser test cases on xpu by @yao-matrix in #11444
- Add generic support for Intel Gaudi accelerator (hpu device) by @dsocek in #11328
- Add StableDiffusion3InstructPix2PixPipeline by @xduzhangjiayu in #11378
- make safe diffusion test cases pass on XPU and A100 by @yao-matrix in #11458
- [test_models_transformer_hunyuan_video] help us test torch.compile() for impactful models by @tongyu0924 in #11431
- Add LANCZOS as default interplotation mode. by @Va16hav07 in #11463
- make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu by @yao-matrix in #11461
- [WAN] fix recompilation issues by @sayakpaul in #11475
- Fix typos in docs and comments by @co63oc in #11416
- [tests] xfail recent pipeline tests for specific methods. by @sayakpaul in #11469
- cache packages_distributions by @vladmandic in #11453
- [docs] Memory optims by @stevhliu in #11385
- [docs] Adapters by @stevhliu in #11331
- [train_dreambooth_lora_sdxl_advanced] Add LANCZOS as the default interpolation mode for image resizing by @yuanjua in #11471
- [train_dreambooth_lora_flux_advanced] Add LANCZOS as the default interpolation mode for image resizing by @ysurs in #11472
- enable semantic diffusion and stable diffusion panorama cases on XPU by @yao-matrix in #11459
- [Feature] Implement tiled VAE encoding/decoding for Wan model. by @c8ef in #11414
- [train_text_to_image_sdxl]Add LANCZOS as default interpolation mode for image resizing by @ParagEkbote in #11455
- [train_dreambooth_lora_sdxl] Add --image_interpolation_mode option for image resizing (default to lanczos) by @MinJu-Ha in #11490
- [train_dreambooth_lora_lumina2] Add LANCZOS as the default interpolation mode for image resizing by @cjfghk5697 in #11491
- [training] feat: enable quantization for hidream lora training. by @sayakpaul in #11494
- Set LANCZOS as the default interpolation method for image resizing. by @yijun-lee in #11492
- Update training script for txt to img sdxl with lora supp with new interpolation. by @RogerSinghChugh in #11496
- Fix torchao docs typo for fp8 granular quantization by @a-r-r-o-w in #11473
- Update setup.py to pin min version of peft by @sayakpaul in #11502
- update dep table. by @sayakpaul in #11504
- [LoRA] use removeprefix to preserve sanity. by @sayakpaul in #11493
- Hunyuan Video Framepack by @a-r-r-o-w in #11428
- enable lora cases on XPU by @yao-matrix in #11506
- [lora_conversion] Enhance key handling for OneTrainer components in LORA conversion utility by @iamwavecut in #11441)
- [docs] minor updates to bitsandbytes docs. by @sayakpaul in #11509
- Cosmos by @a-r-r-o-w in #10660
- clean up the Init for stable_diffusion by @yiyixuxu in #11500
- fix audioldm by @sayakpaul (direct commit on v0.34.0-release)
- Revert "fix audioldm" by @sayakpaul (direct commit on v0.34.0-release)
- [LoRA] make lora alpha and dropout configurable by @linoytsaban in #11467
- Add cross attention type for Sana-Sprint training in diffusers. by @scxue in #11514
- Conditionally import torchvision in Cosmos transformer by @a-r-r-o-w in #11524
- [tests] fix audioldm2 for transformers main. by @sayakpaul in #11522
- feat: pipeline-level quantization config by @sayakpaul in #11130
- [Tests] Enable more general testing for torch.compile() with LoRA hotswapping by @sayakpaul in #11322
- [LoRA] support non-diffusers hidream loras by @sayakpaul in #11532
- enable 7 cases on XPU by @yao-matrix in #11503
- [LTXPipeline] Update latents dtype to match VAE dtype by @james-p-xu in #11533
- enable dit integration cases on xpu by @yao-matrix in #11523
- enable print_env on xpu by @yao-matrix in #11507
- Change Framepack transformer layer initialization order by @a-r-r-o-w in #11535
- [tests] add tests for framepack transformer model. by @sayakpaul in #11520
- Hunyuan Video Framepack F1 by @a-r-r-o-w in #11534
- enable several pipeline integration tests on XPU by @yao-matrix in #11526
- [test_models_transformer_ltx.py] help us test torch.compile() for impactful models by @cjfghk5697 in #11512
- Add VisualCloze by @lzyhha in #11377
- Fix typo in train_diffusion_orpo_sdxl_lora_wds.py by @Meeex2 in #11541
- fix: remove torch_dtype="auto" option from docstrings by @johannaSommer in #11513
- [train_dreambooth.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @kghamilton89 in #11239
- [LoRA] small change to support Hunyuan LoRA Loading for FramePack by @linoytsaban in #11546
- LTX Video 0.9.7 by @a-r-r-o-w in #11516
- [tests] Enable testing for HiDream transformer by @sayakpaul in #11478
- Update pipeline_flux_img2img.py to add missing vae_slicing and vae_tiling calls. by @Meatfucker in #11545
- Fix deprecation warnings in test_ltx_image2video.py by @AChowdhury1211 in #11538
- [tests] Add torch.compile test for UNet2DConditionModel by @olccihyeon in #11537
- [Single File] GGUF/Single File Support for HiDream by @DN6 in #11550
- [gguf] Refactor torch_function to avoid unnecessary computation by @anijain2305 in #11551
- [tests] add tests for combining layerwise upcasting and groupoffloading. by @sayakpaul in #11558
- [docs] Regional compilation docs by @sayakpaul in #11556
- enhance value guard of _device_agnostic_dispatch by @yao-matrix in #11553
- Doc update by @Player256 in #11531
- Revert error to warning when loading LoRA from repo with multiple weights by @apolinario in #11568
- [docs] tip for group offloding + quantization by @sayakpaul in #11576
- [LoRA] support non-diffusers LTX-Video loras by @linoytsaban in #11572
- [WIP][LoRA] start supporting kijai wan lora. by @sayakpaul in #11579
- [Single File] Fix loading for LTX 0.9.7 transformer by @DN6 in #11578
- Use HF Papers by @qgallouedec in #11567
- LTX 0.9.7-distilled; documentation improvements by @a-r-r-o-w in #11571
- [LoRA] kijai wan lora support for I2V by @linoytsaban in #11588
- docs: fix invalid links by @osrm in #11505
- [docs] Remove fast diffusion tutorial by @stevhliu in #11583
- RegionalPrompting: Inherit from Stable Diffusion by @b-sai in #11525
- [chore] allow string device to be passed to randn_tensor. by @sayakpaul in #11559
- Type annotation fix by @DN6 in #11597
- [LoRA] minor fix for load_lora_weights() for Flux and a test by @sayakpaul in #11595
- Update Intel Gaudi doc by @regisss in #11479
- enable pipeline test cases on xpu by @yao-matrix in #11527
- [Feature] AutoModel can load components using model_index.json by @ishan-modi in #11401
- [docs] Pipeline-level quantization by @stevhliu in #11604
- Fix bug when variant and safetensor file does not match by @kaixuanliu in #11587
- [tests] Changes to the torch.compile() CI and tests by @sayakpaul in #11508
- Fix mixed variant downloading by @DN6 in #11611
- fix security issue in build docker ci by @sayakpaul in #11614
- Make group offloading compatible with torch.compile() by @sayakpaul in #11605
- [training docs] smol update to README files by @linoytsaban in #11616
- Adding NPU for get device function by @leisuzz in #11617
- [LoRA] improve LoRA fusion tests by @sayakpaul in #11274
- [Sana Sprint] add image-to-image pipeline by @linoytsaban in #11602
- [CI] fix the filename for displaying failures in lora ci. by @sayakpaul in #11600
- [docs] PyTorch 2.0 by @stevhliu in #11618
- [textual_inversion_sdxl.py] fix lr scheduler steps count by @yuanjua in #11557
- Fix wrong indent for examples of controlnet script by @Justin900429 in #11632
- removing unnecessary else statement by @YanivDorGalron in #11624
- enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed by @yao-matrix in #11620
- Bug: Fixed Image 2 Image example by @vltmedia in #11619
- typo fix in pipeline_flux.py by @YanivDorGalron in #11623
- Fix typos in strings and comments by @co63oc in #11476
- [docs] update torchao doc link by @sayakpaul in #11634
- Use float32 RoPE freqs in Wan with MPS backends by @hvaara in #11643
- [chore] misc changes in the bnb tests for consistency. by @sayakpaul in #11355
- [tests] chore: rename lora model-level tests. by @sayakpaul in #11481
- [docs] Caching methods by @stevhliu in #11625
- [docs] Model cards by @stevhliu in #11112
- [CI] Some improvements to Nightly reports summaries by @DN6 in #11166
- [chore] bring PipelineQuantizationConfig at the top of the import chain. by @sayakpaul in #11656
- [examples] flux-control: use num_training_steps_for_scheduler by @Markus-Pobitzer in #11662
- use deterministic to get stable result by @jiqing-feng in #11663
- [tests] add test for torch.compile + group offloading by @sayakpaul in #11670
- Wan VACE by @a-r-r-o-w in #11582
- fixed axes_dims_rope init (huggingface#11641) by @sofinvalery in #11678
- [tests] Fix how compiler mixin classes are used by @sayakpaul in #11680
- Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process by @DN6 in #11596
- Add community class StableDiffusionXL_T5Pipeline by @ppbrown in #11626
- Update pipeline_flux_inpaint.py to fix padding_mask_crop returning only the inpainted area by @Meatfucker in #11658
- Allow remote code repo names to contain "." by @akasharidas in #11652
- [LoRA] support Flux Control LoRA with bnb 8bit. by @sayakpaul in #11655
- [Wan] Fix VAE sampling mode in WanVideoToVideoPipeline by @tolgacangoz in #11639
- enable torchao test cases on XPU and switch to device agnostic APIs for test cases by @yao-matrix in #11654
- [tests] tests for compilation + quantization (bnb) by @sayakpaul in #11672
- [tests] model-level device_map clarifications by @sayakpaul in #11681
- Improve Wan docstrings by @a-r-r-o-w in #11689
- Set _torch_version to N/A if torch is disabled. by @rasmi in #11645
- Avoid DtoH sync from access of nonzero() item in scheduler by @jbschlosser in #11696
- Apply Occam's Razor in position embedding calculation by @tolgacangoz in #11562
- [docs] add compilation bits to the bitsandbytes docs. by @sayakpaul in #11693
- swap out token for style bot. by @sayakpaul in #11701
- [docs] mention fp8 benefits on supported hardware. by @sayakpaul in #11699
- Support Wan AccVideo lora by @a-r-r-o-w in #11704
- [LoRA] parse metadata from LoRA and save metadata by @sayakpaul in #11324
- Cosmos Predict2 by @a-r-r-o-w in #11695
- Chroma Pipeline by @Ednaordinary in #11698
- [LoRA ]fix flux lora loader when return_metadata is true for non-diffusers by @sayakpaul in #11716
- [training] show how metadata stuff should be incorporated in training scripts. by @sayakpaul in #11707
- Fix misleading comment by @carlthome in #11722
- Add Pruna optimization framework documentation by @davidberenstein1957 in #11688
- Support more Wan loras (VACE) by @a-r-r-o-w in #11726
- [LoRA training] update metadata use for lora alpha + README by @linoytsaban in #11723
- ⚡️ Speed up method AutoencoderKLWan.clear_cache by 886% by @misrasaurabh1 in #11665
- [training] add ds support to lora hidream by @leisuzz in #11737
- [tests] device_map tests for all models. by @sayakpaul in #11708
- [chore] change to 2025 licensing for remaining by @sayakpaul in #11741
- Chroma Follow Up by @DN6 in #11725
- [Quantizers] add is_compileable property to quantizers. by @sayakpaul in #11736
- Update more licenses to 2025 by @a-r-r-o-w in #11746
- Add missing HiDream license by @a-r-r-o-w in #11747
- Bump urllib3 from 2.2.3 to 2.5.0 in /examples/server by @dependabot[bot] in #11748
- [LoRA] refactor lora loading at the model-level by @sayakpaul in #11719
- [CI] Fix WAN VACE tests by @DN6 in #11757
- [CI] Fix SANA tests by @DN6 in #11756
- Fix HiDream pipeline test module by @DN6 in #11754
- make group offloading work with disk/nvme transfers by @sayakpaul in #11682
- Update Chroma Docs by @DN6 in #11753
- fix invalid component handling behaviour in PipelineQuantizationConfig by @sayakpaul in #11750
- Fix failing cpu offload test for LTX Latent Upscale by @DN6 in #11755
- [docs] Quantization + torch.compile + offloading by @stevhliu in #11703
- [docs] device_map by @stevhliu in #11711
- [docs] LoRA scale scheduling by @stevhliu in #11727
- Fix dimensionalities in apply_rotary_emb functions' comments by @tolgacangoz in #11717
- enable deterministic in bnb 4 bit tests by @jiqing-feng in #11738
- enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU by @yao-matrix in #11671
- [tests] properly skip tests instead of return by @sayakpaul in #11771
- [CI] Skip ONNX Upscale tests by @DN6 in #11774
- [Wan] Fix mask padding in Wan VACE pipeline. by @bennyguo in #11778
- Add --lora_alpha and metadata handling to train_dreambooth_lora_sana.py by @imbr92 in #11744
- [docs] minor cleanups in the lora docs. by @sayakpaul in #11770
- [lora] only remove hooks that we add back by @yiyixuxu in #11768
- [tests] Fix HunyuanVideo Framepack device tests by @a-r-r-o-w in #11789
- [chore] raise as early as possible in group offloading by @sayakpaul in #11792
- [tests] Fix group offloading and layerwise casting test interaction by @a-r-r-o-w in #11796
- guard omnigen processor. by @sayakpaul in #11799
- Release: v0.34.0 by @sayakpaul (direct commit on v0.34.0-release)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
@yao-matrix
- fix test_vanilla_funetuning failure on XPU and A100 (#11263)
- make test_stable_diffusion_inpaint_fp16 pass on XPU (#11264)
- make test_dict_tuple_outputs_equivalent pass on XPU (#11265)
- make test_instant_style_multiple_masks pass on XPU (#11266)
- make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU (#11308)
- make test_stable_diffusion_karras_sigmas pass on XPU (#11310)
- fix CPU offloading related fail cases on XPU (#11288)
- enable 2 test cases on XPU (#11332)
- enable group_offload cases and quanto cases on XPU (#11405)
- enable test_layerwise_casting_memory cases on XPU (#11406)
- enable 28 GGUF test cases on XPU (#11404)
- enable marigold_intrinsics cases on XPU (#11445)
- enable consistency test cases on XPU, all passed (#11446)
- enable unidiffuser test cases on xpu (#11444)
- make safe diffusion test cases pass on XPU and A100 (#11458)
- make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu (#11461)
- enable semantic diffusion and stable diffusion panorama cases on XPU (#11459)
- enable lora cases on XPU (#11506)
- enable 7 cases on XPU (#11503)
- enable dit integration cases on xpu (#11523)
- enable print_env on xpu (#11507)
- enable several pipeline integration tests on XPU (#11526)
- enhance value guard of _device_agnostic_dispatch (#11553)
- enable pipeline test cases on xpu (#11527)
- enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed (#11620)
- enable torchao test cases on XPU and switch to device agnostic APIs for test cases (#11654)
- enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU (#11671)
@hlky
- Fix LTX 0.9.5 single file (#11271)
- HiDream Image (#11231)
- Use float32 on mps or npu in transformer_hidream_image's rope (#11316)
- Fix vae.Decoder prev_output_channel (#11280)
@quickjkee
- flow matching lcm scheduler (#11170)
@ishan-modi
- [ControlNet] Adds controlnet for SanaTransformer (#11040)
- [BUG] fixed _toctree.yml alphabetical ordering (#11277)
- [BUG] fixes in kadinsky pipeline (#11080)
- [Refactor] Minor Improvement for import utils (#11161)
- [Feature] Added Xlab Controlnet support (#11249)
- [BUG] fixed WAN docstring (#11226)
- [Feature] AutoModel can load components using model_index.json (#11401)
@linoytsaban
- [HiDream] code example (#11317)
- [Flux LoRAs] fix lr scheduler bug in distributed scenarios (#11242)
- [LoRA] add LoRA support to HiDream and fine-tuning script (#11281)
- [HiDream LoRA] optimizations + small updates (#11381)
- [Hi-Dream LoRA] fix bug in validation (#11439)
- [LoRA] make lora alpha and dropout configurable (#11467)
- [LoRA] small change to support Hunyuan LoRA Loading for FramePack (#11546)
- [LoRA] support non-diffusers LTX-Video loras (#11572)
- [LoRA] kijai wan lora support for I2V (#11588)
- [training docs] smol update to README files (#11616)
- [Sana Sprint] add image-to-image pipeline (#11602)
- [LoRA training] update metadata use for lora alpha + README (#11723)
@hameerabbasi
- [LoRA] Add LoRA support to AuraFlow (#10216)
@DN6
- Fix Hunyuan I2V for transformers>4.47.1 (#11293)
- Hunyuan I2V fast tests fix (#11341)
- [Single File] GGUF/Single File Support for HiDream (#11550)
- [Single File] Fix loading for LTX 0.9.7 transformer (#11578)
- Type annotation fix (#11597)
- Fix mixed variant downloading (#11611)
- [CI] Some improvements to Nightly reports summaries (#11166)
- Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process (#11596)
- Chroma Follow Up (#11725)
- [CI] Fix WAN VACE tests (#11757)
- [CI] Fix SANA tests (#11756)
- Fix HiDream pipeline test module (#11754)
- Update Chroma Docs (#11753)
- Fix failing cpu offload test for LTX Latent Upscale (#11755)
- [CI] Skip ONNX Upscale tests (#11774)
@yiyixuxu
- [Hi Dream] follow-up (#11296)
- support Wan-FLF2V (#11353)
- update output for Hidream transformer (#11366)
- [Wan2.1-FLF2V] update conversion script (#11365)
- [HiDream] move deprecation to 0.35.0 (#11384)
- clean up the Init for stable_diffusion (#11500)
- [lora] only remove hooks that we add back (#11768)
@Teriks
- Kolors additional pipelines, community contrib (#11372)
@co63oc
- Fix typos in strings and comments (#11407)
- Fix typos in docs and comments (#11416)
- Fix typos in strings and comments (#11476)
@xduzhangjiayu
- Add StableDiffusion3InstructPix2PixPipeline (#11378)
@scxue
- Add cross attention type for Sana-Sprint training in diffusers. (#11514)
@lzyhha
- Add VisualCloze (#11377)
@b-sai
- RegionalPrompting: Inherit from Stable Diffusion (#11525)
@Ednaordinary
- Chroma Pipeline (#11698)
- Apr 10, 2025
- Date parsed from source:Apr 10, 2025
- First seen by Releasebot:Mar 20, 2026
v0.33.1: fix ftfy import
diffusers fixes ftfy import for Wan pipelines.
All commits
- fix ftfy import for wan pipelines by @yiyixuxu in #11262
- Apr 9, 2025
- Date parsed from source:Apr 9, 2025
- First seen by Releasebot:Mar 20, 2026
Diffusers 0.33.0: New Image and Video Models, Memory Optimizations, Caching Methods, Remote VAEs, New Training Scripts, and more
diffusers releases a major update for video and image generation, adding Wan 2.1, LTX Video 0.9.5, Hunyuan Image to Video, Sana-Sprint, Lumina2, OmniGen and more. It also brings memory optimizations, cached inference, quantization, LoRA improvements and broader pipeline support.
New Pipelines for Video Generation
Wan 2.1
Wan2.1 is a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. The model release includes 4 different model variants and three different pipelines for Text to Video, Image to Video and Video to Video.
- Wan-AI/Wan2.1-T2V-1.3B-Diffusers
- Wan-AI/Wan2.1-T2V-14B-Diffusers
- Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
- Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
Check out the docs here to learn more.
LTX Video 0.9.5
LTX Video 0.9.5 is the updated version of the super-fast LTX Video model series. The latest model introduces additional conditioning options, such as keyframe-based animation and video extension (both forward and backward).
To support these additional conditioning inputs, we’ve introduced the LTXConditionPipeline and LTXVideoCondition object.
To learn more about the usage, check out the docs here.
Hunyuan Image to Video
Hunyuan utilizes a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only architecture as the text encoder. The input image is processed by the MLLM to generate semantic image tokens. These tokens are then concatenated with the video latent tokens, enabling comprehensive full-attention computation across the combined data and seamlessly integrating information from both the image and its associated caption.
To learn more, check out the docs here.
Others
- EasyAnimateV5 (thanks to @bubbliiiing for contributing this in this PR)
- ConsisID (thanks to @SHYuanBest for contributing this in this PR)
New Pipelines for Image Generation
Sana-Sprint
SANA-Sprint is an efficient diffusion model for ultra-fast text-to-image generation. SANA-Sprint is built on a pre-trained foundation model and augmented with hybrid distillation, dramatically reducing inference steps from 20 to 1-4, rivaling the quality of models like Flux.
Shoutout to @lawrence-cj for their help and guidance on this PR.
Check out the pipeline docs of SANA-Sprint to learn more.
Lumina2
Lumina-Image-2.0 is a 2B parameter flow-based diffusion transformer for text-to-image generation released under the Apache 2.0 license.
Check out the docs to learn more. Thanks to @zhuole1025 for contributing this through this PR.
One can also LoRA fine-tune Lumina2, taking advantage of its Apach2.0 licensing. Check out the guide for more details.
Omnigen
OmniGen is a unified image generation model that can handle multiple tasks including text-to-image, image editing, subject-driven generation, and various computer vision tasks within a single framework. The model consists of a VAE, and a single transformer based on Phi-3 that handles text and image encoding as well as the diffusion process.
Check out the docs to learn more about OmniGen. Thanks to @staoxiao for contributing OmniGen in this PR.
Others
- CogView4 (thanks to @zRzRzRzRzRzRzR for contributing CogView4 in this PR)
New Memory Optimizations
Layerwise Casting
PyTorch supports torch.float8_e4m3fn and torch.float8_e5m2 as weight storage dtypes, but they can’t be used for computation on many devices due to unimplemented kernel support.
However, you can still use these dtypes to store model weights in FP8 precision and upcast them to a widely supported dtype such as torch.float16 or torch.bfloat16 on-the-fly when the layers are used in the forward pass. This is known as layerwise weight-casting. This can potentially cut down the VRAM requirements of a model by 50%.
Code
import torch from diffusers import CogVideoXPipeline, CogVideoXTransformer3DModel from diffusers.utils import export_to_video model_id = "THUDM/CogVideoX-5b" # Load the model in bfloat16 and enable layerwise casting transformer = CogVideoXTransformer3DModel.from_pretrained(model_id, subfolder="transformer", torch_dtype=torch.bfloat16) transformer.enable_layerwise_casting(storage_dtype=torch.float8_e4m3fn, compute_dtype=torch.bfloat16) # Load the pipeline pipe = CogVideoXPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.bfloat16) pipe.to("cuda") prompt = ( "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. " "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other " "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, " "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. " "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical " "atmosphere of this unique musical performance." ) video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0] export_to_video(video, "output.mp4", fps=8)Group Offloading
Group offloading is the middle ground between sequential and model offloading. It works by offloading groups of internal layers (either torch.nn.ModuleList or torch.nn.Sequential), which uses less memory than model-level offloading. It is also faster than sequential-level offloading because the number of device synchronizations is reduced.
On CUDA devices, we also have the option to enable using layer prefetching with CUDA Streams. The next layer to be executed is loaded onto the accelerator device while the current layer is being executed which makes inference substantially faster while still keeping VRAM requirements very low. With this, we introduce the idea of overlapping computation with data transfer.
One thing to note is that using CUDA streams can cause a considerable spike in CPU RAM usage. Please ensure that the available CPU RAM is 2 times the size of the model if you choose to set use_stream=True. You can reduce CPU RAM usage by setting low_cpu_mem_usage=True. This should limit the CPU RAM used to be roughly the same as the size of the model, but will introduce slight latency in the inference process.
You can also use record_stream=True when using use_stream=True to obtain more speedups at the expense of slightly increased memory usage.
Code
import torch from diffusers import CogVideoXPipeline from diffusers.utils import export_to_video # Load the pipeline onload_device = torch.device("cuda") offload_device = torch.device("cpu") pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) # We can utilize the enable_group_offload method for Diffusers model implementations pipe.transformer.enable_group_offload( onload_device=onload_device, offload_device=offload_device, offload_type="leaf_level", use_stream=True ) prompt = ( "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. " "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other " "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, " "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. " "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical " "atmosphere of this unique musical performance." ) video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0] # This utilized about 14.79 GB. It can be further reduced by using tiling and using leaf_level offloading throughout the pipeline. print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB") export_to_video(video, "output.mp4", fps=8)Group offloading can also be applied to non-Diffusers models such as text encoders from the transformers library.
Code
import torch from diffusers import CogVideoXPipeline from diffusers.hooks import apply_group_offloading from diffusers.utils import export_to_video # Load the pipeline onload_device = torch.device("cuda") offload_device = torch.device("cpu") pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) # For any other model implementations, the apply_group_offloading function can be used apply_group_offloading(pipe.text_encoder, onload_device=onload_device, offload_type="block_level", num_blocks_per_group=2)Remote Components
Remote components are an experimental feature designed to offload memory-intensive steps of the inference pipeline to remote endpoints. The initial implementation focuses primarily on VAE decoding operations. Below are the currently supported model endpoints:
- Model
- Endpoint
- Model
- Stable Diffusion v1
- https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud
- stabilityai/sd-vae-ft-mse
- Stable Diffusion XL
- https://x2dmsqunjd6k9prw.us-east-1.aws.endpoints.huggingface.cloud
- madebyollin/sdxl-vae-fp16-fix
- Flux
- https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud
- black-forest-labs/FLUX.1-schnell
- HunyuanVideo
- https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud
- hunyuanvideo-community/HunyuanVideo
This is an example of using remote decoding with the Hunyuan Video pipeline:
Code
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel model_id = "hunyuanvideo-community/HunyuanVideo" transformer = HunyuanVideoTransformer3DModel.from_pretrained( model_id, subfolder="transformer", torch_dtype=torch.bfloat16 ) pipe = HunyuanVideoPipeline.from_pretrained( model_id, transformer=transformer, vae=None, torch_dtype=torch.float16 ).to("cuda") latent = pipe( prompt="A cat walks on the grass, realistic", height=320, width=512, num_frames=61, num_inference_steps=30, output_type="latent", ).frames video = remote_decode( endpoint="https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud/", tensor=latent, output_type="mp4", ) if isinstance(video, bytes): with open("video.mp4", "wb") as f: f.write(video)Check out the docs to know more.
Introducing Cached Inference for DiTs
Cached Inference for Diffusion Transformer models is a performance optimization that significantly accelerates the denoising process by caching intermediate values. This technique reduces redundant computations across timesteps, resulting in faster generation with a slight dip in output quality.
Check out the docs to learn more about the available caching methods.
Pyramind Attention Broadcast
Code
import torch from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) pipe.to("cuda") config = PyramidAttentionBroadcastConfig( spatial_attention_block_skip_range=2, spatial_attention_timestep_skip_range=(100, 800), current_timestep_callback=lambda: pipe.current_timestep, ) pipe.transformer.enable_cache(config)FasterCache
Code
import torch from diffusers import CogVideoXPipeline, FasterCacheConfig pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) pipe.to("cuda") config = FasterCacheConfig( spatial_attention_block_skip_range=2, spatial_attention_timestep_skip_range=(-1, 901), unconditional_batch_skip_range=2, attention_weight_callback=lambda _: 0.5, is_guidance_distilled=True, ) pipe.transformer.enable_cache(config)Quantization
Quanto Backend
Diffusers now has support for the Quanto quantization backend, which provides float8 , int8 , int4 and int2 quantization dtypes.
import torch from diffusers import FluxTransformer2DModel, QuantoConfig model_id = "black-forest-labs/FLUX.1-dev" quantization_config = QuantoConfig(weights_dtype="float8") transformer = FluxTransformer2DModel.from_pretrained( model_id, subfolder="transformer", quantization_config=quantization_config, torch_dtype=torch.bfloat16, )Quanto int8 models are also compatible with torch.compile :
import torch from diffusers import FluxTransformer2DModel, QuantoConfig model_id = "black-forest-labs/FLUX.1-dev" quantization_config = QuantoConfig(weights_dtype="float8") transformer = FluxTransformer2DModel.from_pretrained( model_id, subfolder="transformer", quantization_config=quantization_config, torch_dtype=torch.bfloat16, ) transformer.compile()Improved loading for uintx TorchAO checkpoints with torch>=2.6
TorchAO checkpoints currently have to be serialized using pickle. For some quantization dtypes using the uintx format, such as uint4wo this involves saving subclassed TorchAO Tensor objects in the model file. This made loading the models directly with Diffusers a bit tricky since we do not allow deserializing artbitary Python objects from pickle files.
Torch 2.6 allows adding expected Tensors to torch safe globals, which lets us directly load TorchAO checkpoints with these objects.
- state_dict = torch.load("/path/to/flux_uint4wo/diffusion_pytorch_model.bin", weights_only=False, map_location="cpu")
- with init_empty_weights():
- transformer = FluxTransformer2DModel.from_config("/path/to/flux_uint4wo/config.json")
- transformer.load_state_dict(state_dict, strict=True, assign=True)
- transformer = FluxTransformer2DModel.from_pretrained("/path/to/flux_uint4wo/")
LoRAs
We have shipped a couple of improvements on the LoRA front in this release.
- 🚨 Improved coverage for loading non-diffusers LoRA checkpoints for Flux
Take note of the breaking change introduced in this PR 🚨 We suggest you upgrade your peft installation to the latest version - pip install -U peft especially when dealing with Flux LoRAs.
- torch.compile() support when hotswapping LoRAs without triggering recompilation
A common use case when serving multiple adapters is to load one adapter first, generate images, load another adapter, generate more images, load another adapter, etc. This workflow normally requires calling load_lora_weights(), set_adapters(), and possibly delete_adapters() to save memory. Moreover, if the model is compiled using torch.compile, performing these steps requires recompilation, which takes time.
To better support this common workflow, you can “hotswap” a LoRA adapter, to avoid accumulating memory and in some cases, recompilation. It requires an adapter to already be loaded, and the new adapter weights are swapped in-place for the existing adapter.
Check out the docs to learn more about this feature.
The other major change is the support for
- Loading LoRAs into quantized model checkpoints
dtype Maps for Pipelines
Since various pipelines require their components to run in different compute dtypes, we now support passing a dtype map when initializing a pipeline:
from diffusers import HunyuanVideoPipeline import torch pipe = HunyuanVideoPipeline.from_pretrained( "hunyuanvideo-community/HunyuanVideo", torch_dtype={"transformer": torch.bfloat16, "default": torch.float16}, ) print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16)AutoModel
This release includes an AutoModel object similar to the one found in transformers that automatically fetches the appropriate model class for the provided repo.
from diffusers import AutoModel unet = AutoModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet")All commits
- [Sana 4K] Add vae tiling option to avoid OOM by @leisuzz in #10583
- IP-Adapter for StableDiffusion3Img2ImgPipeline by @guiyrt in #10589
- [DC-AE, SANA] fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16 by @chenjy2003 in #10595
- Move buffers to device by @hlky in #10523
- [Docs] Update SD3 ip_adapter model_id to diffusers checkpoint by @guiyrt in #10597
- Scheduling fixes on MPS by @hlky in #10549
- [Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo by @chengzeyi in #10544
- NPU adaption for RMSNorm by @leisuzz in #10534
- implementing flux on TPUs with ptxla by @entrpn in #10515
- [core] ConsisID by @SHYuanBest in #10140
- [training] set rest of the blocks with requires_grad False. by @sayakpaul in #10607
- chore: remove redundant words by @sunxunle in #10609
- bugfix for npu not support float64 by @baymax591 in #10123
- [chore] change licensing to 2025 from 2024. by @sayakpaul in #10615
- Enable dreambooth lora finetune example on other devices by @jiqing-feng in #10602
- Remove the FP32 Wrapper when evaluating by @lmxyy in #10617
- [tests] make tests device-agnostic (part 3) by @faaany in #10437
- fix offload gpu tests etc by @yiyixuxu in #10366
- Remove cache migration script by @Wauplin in #10619
- [core] Layerwise Upcasting by @a-r-r-o-w in #10347
- Improve TorchAO error message by @a-r-r-o-w in #10627
- [CI] Update HF_TOKEN in all workflows by @DN6 in #10613
- add onnxruntime-migraphx as part of check for onnxruntime in import_utils.py by @kahmed10 in #10624
- [Tests] modify the test slices for the failing flax test by @sayakpaul in #10630
- [docs] fix image path in para attention docs by @sayakpaul in #10632
- [docs] uv installation by @stevhliu in #10622
- width and height are mixed-up by @raulc0399 in #10629
- Add IP-Adapter example to Flux docs by @hlky in #10633
- removing redundant requires_grad = False by @YanivDorGalron in #10628
- [chore] add a script to extract loras from full fine-tuned models by @sayakpaul in #10631
- Add pipeline_stable_diffusion_xl_attentive_eraser by @Anonym0u3 in #10579
- NPU Adaption for Sanna by @leisuzz in #10409
- Add sigmoid scheduler in scheduling_ddpm.py docs by @JacobHelwig in #10648
- create a script to train autoencoderkl by @lavinal712 in #10605
- Add community pipeline for semantic guidance for FLUX by @Marlon154 in #10610
- ControlNet Union controlnet_conditioning_scale for multiple control inputs by @hlky in #10666
- [training] Convert to ImageFolder script by @hlky in #10664
- Add provider_options to OnnxRuntimeModel by @hlky in #10661
- fix check_inputs func in LuminaText2ImgPipeline by @victolee0 in #10651
- SDXL ControlNet Union pipelines, make control_image argument immutible by @Teriks in #10663
- Revert RePaint scheduler 'fix' by @GiusCat in #10644
- [core] Pyramid Attention Broadcast by @a-r-r-o-w in #9562
- [fix] refer use_framewise_encoding on AutoencoderKLHunyuanVideo._encode by @hanchchch in #10600
- Refactor gradient checkpointing by @a-r-r-o-w in #10611
- [Tests] conditionally check fp8_e4m3_bf16_max_memory < fp8_e4m3_fp32_max_memory by @sayakpaul in #10669
- Fix pipeline dtype unexpected change when using SDXL reference community pipelines in float16 mode by @dimitribarbot in #10670
- [tests] update llamatokenizer in hunyuanvideo tests by @sayakpaul in #10681
- support StableDiffusionAdapterPipeline.from_single_file by @Teriks in #10552
- fix(hunyuan-video): typo in height and width input check by @badayvedat in #10684
- [FIX] check_inputs function in Auraflow Pipeline by @SahilCarterr in #10678
- Fix enable memory efficient attention on ROCm by @tenpercent in #10564
- Fix inconsistent random transform in instruct pix2pix by @Luvata in #10698
- feat(training-utils): support device and dtype params in compute_density_for_timestep_sampling by @badayvedat in #10699
- Fixed grammar in "write_own_pipeline" readme by @N0-Flux-given in #10706
- Fix Documentation about Image-to-Image Pipeline by @ParagEkbote in #10704
- [bitsandbytes] Simplify bnb int8 dequant by @sayakpaul in #10401
- Fix train_text_to_image.py --help by @nkthiebaut in #10711
- Notebooks for Community Scripts-6 by @ParagEkbote in #10713
- [Fix] Type Hint in from_pretrained() to Ensure Correct Type Inference by @SahilCarterr in #10714
- add provider_options in from_pretrained by @xieofxie in #10719
- [Community] Enhanced Model Search by @suzukimain in #10417
- [bugfix] NPU Adaption for Sana by @leisuzz in #10724
- Quantized Flux with IP-Adapter by @hlky in #10728
- EDMEulerScheduler accept sigmas, add final_sigmas_type by @hlky in #10734
- [LoRA] fix peft state dict parsing by @sayakpaul in #10532
- Add Self type hint to ModelMixin's from_pretrained by @hlky in #10742
- [Tests] Test layerwise casting with training by @sayakpaul in #10765
- speedup hunyuan encoder causal mask generation by @dabeschte in #10764
- [CI] Fix Truffle Hog failure by @DN6 in #10769
- Add OmniGen by @staoxiao in #10148
- feat: new community mixture_tiling_sdxl pipeline for SDXL by @elismasilva in #10759
- Add support for lumina2 by @zhuole1025 in #10642
- Refactor OmniGen by @a-r-r-o-w in #10771
- Faster set_adapters by @Luvata in #10777
- [Single File] Add Single File support for Lumina Image 2.0 Transformer by @DN6 in #10781
- Fix use_lu_lambdas and use_karras_sigmas with beta_schedule=squaredcos_cap_v2 in DPMSolverMultistepScheduler by @hlky in #10740
- MultiControlNetUnionModel on SDXL by @guiyrt in #10747
- fix: [Community pipeline] Fix flattened elements on image by @elismasilva in #10774
- make tensors contiguous before passing to safetensors by @faaany in #10761
- Disable PEFT input autocast when using fp8 layerwise casting by @a-r-r-o-w in #10685
- Update FlowMatch docstrings to mention correct output classes by @a-r-r-o-w in #10788
- Refactor CogVideoX transformer forward by @a-r-r-o-w in #10789
- Module Group Offloading by @a-r-r-o-w in #10503
- Update Custom Diffusion Documentation for Multiple Concept Inference to resolve issue #10791 by @puhuk in #10792
- [FIX] check_inputs function in lumina2 by @SahilCarterr in #10784
- follow-up refactor on lumina2 by @yiyixuxu in #10776
- CogView4 (supports different length c and uc) by @zRzRzRzRzRzRzR in #10649
- typo fix by @YanivDorGalron in #10802
- Extend Support for callback_on_step_end for AuraFlow and LuminaText2Img Pipelines by @ParagEkbote in #10746
- [chore] update notes generation spaces by @sayakpaul in #10592
- [LoRA] improve lora support for flux. by @sayakpaul in #10810
- Fix max_shift value in flux and related functions to 1.15 (issue #10675) by @puhuk in #10807
- [docs] add missing entries to the lora docs. by @sayakpaul in #10819
- DiffusionPipeline mixin to+FromOriginalModelMixin/FromSingleFileMixin from_single_file type hint by @hlky in #10811
- [LoRA] make set_adapters() robust on silent failures. by @sayakpaul in #9618
- [FEAT] Model loading refactor by @SunMarc in #10604
- [misc] feat: introduce a style bot. by @sayakpaul in #10274
- Remove print statements by @a-r-r-o-w in #10836
- [tests] use proper gemma class and config in lumina2 tests. by @sayakpaul in #10828
- [LoRA] add LoRA support to Lumina2 and fine-tuning script by @sayakpaul in #10818
- [Utils] add utilities for checking if certain utilities are properly documented by @sayakpaul in #7763
- Add missing isinstance for arg checks in GGUFParameter by @AstraliteHeart in #10834
- [tests] test encode_prompt() in isolation by @sayakpaul in #10438
- store activation cls instead of function by @SunMarc in #10832
- fix: support transformer models' generation_config in pipeline by @JeffersonQin in #10779
- Notebooks for Community Scripts-7 by @ParagEkbote in #10846
- [CI] install accelerate transformers from main by @sayakpaul in #10289
- [CI] run fast gpu tests conditionally on pull requests. by @sayakpaul in #10310
- SD3 IP-Adapter runtime checkpoint conversion by @guiyrt in #10718
- Some consistency-related fixes for HunyuanVideo by @a-r-r-o-w in #10835
- SkyReels Hunyuan T2V & I2V by @a-r-r-o-w in #10837
- fix: run tests from a pr workflow. by @sayakpaul in #9696
- [chore] template for remote vae. by @sayakpaul in #10849
- fix remote vae template by @sayakpaul in #10852
- [CI] Fix incorrectly named test module for Hunyuan DiT by @DN6 in #10854
- [CI] Update always test Pipelines list in Pipeline fetcher by @DN6 in #10856
- device_map in load_model_dict_into_meta by @hlky in #10851
- [Fix] Docs overview.md by @SahilCarterr in #10858
- remove format check for safetensors file by @SunMarc in #10864
- [docs] LoRA support by @stevhliu in #10844
- Comprehensive type checking for from_pretrained kwargs by @guiyrt in #10758
- Fix torch_dtype in Kolors text encoder with transformers v4.49 by @hlky in #10816
- [LoRA] restrict certain keys to be checked for peft config update. by @sayakpaul in #10808
- Add SD3 ControlNet to AutoPipeline by @hlky in #10888
- [docs] Update prompt weighting docs by @stevhliu in #10843
- [docs] Flux group offload by @stevhliu in #10847
- [Fix] fp16 unscaling in train_dreambooth_lora_sdxl by @SahilCarterr in #10889
- [docs] Add CogVideoX Schedulers by @a-r-r-o-w in #10885
- [chore] correct qk norm list. by @sayakpaul in #10876
- [Docs] Fix toctree sorting by @DN6 in #10894
- [refactor] SD3 docs & remove additional code by @a-r-r-o-w in #10882
- [refactor] Remove additional Flux code by @a-r-r-o-w in #10881
- [CI] Improvements to conditional GPU PR tests by @DN6 in #10859
- Multi IP-Adapter for Flux pipelines by @guiyrt in #10867
- Fix Callback Tensor Inputs of the SDXL Controlnet Inpaint and Img2img Pipelines are missing "controlnet_image". by @CyberVy in #10880
- Security fix by @ydshieh in #10905
- Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation by @toshas in #10884
- [Tests] fix: lumina2 lora fuse_nan test by @sayakpaul in #10911
- Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. by @CyberVy in #10907
- [CI] Fix Fast GPU tests on PR by @DN6 in #10912
- [CI] Fix for failing IP Adapter test in Fast GPU PR tests by @DN6 in #10915
- Experimental per control type scale for ControlNet Union by @hlky in #10723
- [style bot] improve security for the stylebot. by @sayakpaul in #10908
- [CI] Update Stylebot Permissions by @DN6 in #10931
- [Alibaba Wan Team] continue on #10921 Wan2.1 by @yiyixuxu in #10922
- Support IPAdapter for more Flux pipelines by @hlky in #10708
- Add remote_decode to remote_utils by @hlky in #10898
- Update VAE Decode endpoints by @hlky in #10939
- [chore] fix-copies to flux pipelines by @sayakpaul in #10941
- [Tests] Remove more encode prompts tests by @sayakpaul in #10942
- Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model by @bubbliiiing in #10626
- Fix SD2.X clip single file load projection_dim by @Teriks in #10770
- add from_single_file to animatediff by @ in #10924
- Add Example of IPAdapterScaleCutoffCallback to Docs by @ParagEkbote in #10934
- Update pipeline_cogview4.py by @zRzRzRzRzRzRzR in #10944
- Fix redundant prev_output_channel assignment in UNet2DModel by @ahmedbelgacem in #10945
- Improve load_ip_adapter RAM Usage by @CyberVy in #10948
- [tests] make tests device-agnostic (part 4) by @faaany in #10508
- Update evaluation.md by @sayakpaul in #10938
- [LoRA] feat: support non-diffusers lumina2 LoRAs. by @sayakpaul in #10909
- [Quantization] support pass MappingType for TorchAoConfig by @a120092009 in #10927
- Fix the missing parentheses when calling is_torchao_available in quantization_config.py. by @CyberVy in #10961
- [LoRA] Support Wan by @a-r-r-o-w in #10943
- Fix incorrect seed initialization when args.seed is 0 by @azolotenkov in #10964
- feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL by @elismasilva in #10951
- [Docs] CogView4 comment fix by @zRzRzRzRzRzRzR in #10957
- update check_input for cogview4 by @yiyixuxu in #10966
- Add VAE Decode endpoint slow test by @hlky in #10946
- [flux lora training] fix t5 training bug by @linoytsaban in #10845
- use style bot GH Action from huggingface_hub by @hanouticelina in #10970
- [train_dreambooth_lora.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @flyxiv in #10973
- [tests] fix tests for save load components by @sayakpaul in #10977
- Fix loading OneTrainer Flux LoRA by @hlky in #10978
- fix default values of Flux guidance_scale in docstrings by @catwell in #10982
- [CI] remove synchornized. by @sayakpaul in #10980
- Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/realfill by @dependabot[bot] in #10984
- Fix Flux Controlnet Pipeline _callback_tensor_inputs Missing Some Elements by @CyberVy in #10974
- [Single File] Add user agent to SF download requests. by @DN6 in #10979
- Add CogVideoX DDIM Inversion to Community Pipelines by @LittleNyima in #10956
- fix wan i2v pipeline bugs by @yupeng1111 in #10975
- Hunyuan I2V by @a-r-r-o-w in #10983
- Fix Graph Breaks When Compiling CogView4 by @chengzeyi in #10959
- Wan VAE move scaling to pipeline by @hlky in #10998
- [LoRA] remove full key prefix from peft. by @sayakpaul in #11004
- [Single File] Add single file support for Wan T2V/I2V by @DN6 in #10991
- Add STG to community pipelines by @kinam0252 in #10960
- [LoRA] Improve copied from comments in the LoRA loader classes by @sayakpaul in #10995
- Fix for fetching variants only by @DN6 in #10646
- [Quantization] Add Quanto backend by @DN6 in #10756
- [Single File] Add single file loading for SANA Transformer by @ishan-modi in #10947
- [LoRA] Improve warning messages when LoRA loading becomes a no-op by @sayakpaul in #10187
- [LoRA] CogView4 by @a-r-r-o-w in #10981
- [Tests] improve quantization tests by additionally measuring the inference memory savings by @sayakpaul in #11021
- [Research Project] Add AnyText: Multilingual Visual Text Generation And Editing by @tolgacangoz in #8998
- [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 by @DN6 in #11018
- fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings by @elismasilva in #11012
- [LoRA] support wan i2v loras from the world. by @sayakpaul in #11025
- Fix SD3 IPAdapter feature extractor by @hlky in #11027
- chore: fix help messages in advanced diffusion examples by @wonderfan in #10923
- Fix missing **kwargs in lora_pipeline.py by @CyberVy in #11011
- Fix for multi-GPU WAN inference by @AmericanPresidentJimmyCarter in #10997
- [Refactor] Clean up import utils boilerplate by @DN6 in #11026
- Use output_size in repeat_interleave by @hlky in #11030
- [hybrid inference 🍯🐝] Add VAE encode by @hlky in #11017
- Wan Pipeline scaling fix, type hint warning, multi generator fix by @hlky in #11007
- [LoRA] change to warning from info when notifying the users about a LoRA no-op by @sayakpaul in #11044
- Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline by @hlky in #10827
- making formatted_images initialization compact by @YanivDorGalron in #10801
- Fix aclnnRepeatInterleaveIntWithDim error on NPU for get_1d_rotary_pos_embed by @ZhengKai91 in #10820
- [Tests] restrict memory tests for quanto for certain schemes. by @sayakpaul in #11052
- [LoRA] feat: support non-diffusers wan t2v loras. by @sayakpaul in #11059
- [examples/controlnet/train_controlnet_sd3.py] Fixes #11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch by @andjoer in #11051
- reverts accidental change that removes attn_mask in attn. Improves fl… by @entrpn in #11065
- Fix deterministic issue when getting pipeline dtype and device by @dimitribarbot in #10696
- [Tests] add requires peft decorator. by @sayakpaul in #11037
- CogView4 Control Block by @zRzRzRzRzRzRzR in #10809
- [CI] pin transformers version for benchmarking. by @sayakpaul in #11067
- Fix Wan I2V Quality by @chengzeyi in #11087
- LTX 0.9.5 by @a-r-r-o-w in #10968
- make PR GPU tests conditioned on styling. by @sayakpaul in #11099
- Group offloading improvements by @a-r-r-o-w in #11094
- Fix pipeline_flux_controlnet.py by @co63oc in #11095
- update readme instructions. by @entrpn in #11096
- Resolve stride mismatch in UNet's ResNet to support Torch DDP by @jinc7461 in #11098
- Fix Group offloading behaviour when using streams by @a-r-r-o-w in #11097
- Quality options in export_to_video by @hlky in #11090
- [CI] uninstall deps properly from pr gpu tests. by @sayakpaul in #11102
- [BUG] Fix Autoencoderkl train script by @lavinal712 in #11113
- [Wan LoRAs] make T2V LoRAs compatible with Wan I2V by @linoytsaban in #11107
- [tests] enable bnb tests on xpu by @faaany in #11001
- [fix bug] PixArt inference_steps=1 by @lawrence-cj in #11079
- Flux with Remote Encode by @hlky in #11091
- [tests] make cuda only tests device-agnostic by @faaany in #11058
- Provide option to reduce CPU RAM usage in Group Offload by @DN6 in #11106
- remove F.rms_norm for now by @yiyixuxu in #11126
- Notebooks for Community Scripts-8 by @ParagEkbote in #11128
- fix _callback_tensor_inputs of sd controlnet inpaint pipeline missing some elements by @CyberVy in #11073
- [core] FasterCache by @a-r-r-o-w in #10163
- add sana-sprint by @yiyixuxu in #11074
- Don't override torch_dtype and don't use when quantization_config is set by @hlky in #11039
- Update README and example code for AnyText usage by @tolgacangoz in #11028
- Modify the implementation of retrieve_timesteps in CogView4-Control. by @zRzRzRzRzRzRzR in #11125
- [fix SANA-Sprint] by @lawrence-cj in #11142
- New HunyuanVideo-I2V by @a-r-r-o-w in #11066
- [doc] Fix Korean Controlnet Train doc by @flyxiv in #11141
- Improve information about group offloading and layerwise casting by @a-r-r-o-w in #11101
- add a timestep scale for sana-sprint teacher model by @lawrence-cj in #11150
- [Quantization] dtype fix for GGUF + fix BnB tests by @DN6 in #11159
- Set self._hf_peft_config_loaded to True when LoRA is loaded using load_lora_adapter in PeftAdapterMixin class by @kentdan3msu in #11155
- WanI2V encode_image by @hlky in #11164
- [Docs] Update Wan Docs with memory optimizations by @DN6 in #11089
- Fix LatteTransformer3DModel dtype mismatch with enable_temporal_attentions by @hlky in #11139
- Raise warning and round down if Wan num_frames is not 4k + 1 by @a-r-r-o-w in #11167
- [Docs] Fix environment variables in installation.md by @remarkablemark in #11179
- Add latents_mean and latents_std to SDXLLongPromptWeightingPipeline by @hlky in #11034
- Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is already set by @kakukakujirori in #10918
- [tests] no hard-coded cuda by @faaany in #11186
- [WIP] Add Wan Video2Video by @DN6 in #11053
- map BACKEND_RESET_MAX_MEMORY_ALLOCATED to reset_peak_memory_stats on XPU by @yao-matrix in #11191
- fix autocast by @jiqing-feng in #11190
- fix: for checking mandatory and optional pipeline components by @elismasilva in #11189
- remove unnecessary call to F.pad by @bm-synth in #10620
- allow models to run with a user-provided dtype map instead of a single dtype by @hlky in #10301
- [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU by @faaany in #11197
- Revert save_model in ModelMixin save_pretrained and use safe_serialization=False in test by @hlky in #11196
- [docs] torch_dtype map by @hlky in #11194
- Fix enable_sequential_cpu_offload in CogView4Pipeline by @hlky in #11195
- SchedulerMixin from_pretrained and ConfigMixin Self type annotation by @hlky in #11192
- Update import_utils.py by @Lakshaysharma048 in #10329
- Add CacheMixin to Wan and LTX Transformers by @DN6 in #11187
- feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline by @elismasilva in #11188
- [Model Card] standardize advanced diffusion training sdxl lora by @chiral-carbon in #7615
- Change KolorsPipeline LoRA Loader to StableDiffusion by @BasileLewan in #11198
- Update Style Bot workflow by @hanouticelina in #11202
- Fixed requests.get function call by adding timeout parameter. by @kghamilton89 in #11156
- Fix Single File loading for LTX VAE by @DN6 in #11200
- [feat]Add strength in flux_fill pipeline (denoising strength for fluxfill) by @Suprhimp in #10603
- [LTX0.9.5] Refactor LTXConditionPipeline for text-only conditioning by @tolgacangoz in #11174
- Add Wan with STG as a community pipeline by @Ednaordinary in #11184
- Add missing MochiEncoder3D.gradient_checkpointing attribute by @mjkvaak-amd in #11146
- enable 1 case on XPU by @yao-matrix in #11219
- ensure dtype match between diffused latents and vae weights by @heyalexchoi in #8391
- [docs] MPS update by @stevhliu in #11212
- Add support to pass image embeddings to the WAN I2V pipeline. by @goiri in #11175
- [train_controlnet.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env by @Bhavay-2001 in #8461
- [Training] Better image interpolation in training scripts by @asomoza in #11206
- [LoRA] Implement hot-swapping of LoRA by @BenjaminBossan in #9453
- introduce compute arch specific expectations and fix test_sd3_img2img_inference failure by @yao-matrix in #11227
- [Flux LoRA] fix issues in flux lora scripts by @linoytsaban in #11111
- Flux quantized with lora by @hlky in #10990
- [feat] implement record_stream when using CUDA streams during group offloading by @sayakpaul in #11081
- [bistandbytes] improve replacement warnings for bnb by @sayakpaul in #11132
- minor update to sana sprint docs. by @sayakpaul in #11236
- [docs] minor updates to dtype map docs. by @sayakpaul in #11237
- [LoRA] support more comyui loras for Flux 🚨 by @sayakpaul in #10985
- fix: SD3 ControlNet validation so that it runs on a A100. by @sayakpaul in #11238
- AudioLDM2 Fixes by @hlky in #11244
- AutoModel by @hlky in #11115
- fix FluxReduxSlowTests::test_flux_redux_inference case failure on XPU by @yao-matrix in #11245
- [docs] AutoModel by @hlky in #11250
- Update Ruff to latest Version by @DN6 in #10919
- fix flux controlnet bug by @free001style in #11152
- fix timeout constant by @sayakpaul in #11252
- fix consisid imports by @sayakpaul in #11254
- Release: v0.33.0 by @sayakpaul (direct commit on v0.33.0-release)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @guiyrt
- IP-Adapter for StableDiffusion3Img2ImgPipeline (#10589)
- [Docs] Update SD3 ip_adapter model_id to diffusers checkpoint (#10597)
- MultiControlNetUnionModel on SDXL (#10747)
- SD3 IP-Adapter runtime checkpoint conversion (#10718)
- Comprehensive type checking for from_pretrained kwargs (#10758)
- Multi IP-Adapter for Flux pipelines (#10867)
- @chengzeyi
- [Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo (#10544)
- Fix Graph Breaks When Compiling CogView4 (#10959)
- Fix Wan I2V Quality (#11087)
- @entrpn
- implementing flux on TPUs with ptxla (#10515)
- reverts accidental change that removes attn_mask in attn. Improves fl… (#11065)
- update readme instructions. (#11096)
- @SHYuanBest
- [core] ConsisID (#10140)
- @faaany
- [tests] make tests device-agnostic (part 3) (#10437)
- make tensors contiguous before passing to safetensors (#10761)
- [tests] make tests device-agnostic (part 4) (#10508)
- [tests] enable bnb tests on xpu (#11001)
- [tests] make cuda only tests device-agnostic (#11058)
- [tests] no hard-coded cuda (#11186)
- [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU (#11197)
- @yiyixuxu
- fix offload gpu tests etc (#10366)
- follow-up refactor on lumina2 (#10776)
- [Alibaba Wan Team] continue on #10921 Wan2.1 (#10922)
- update check_input for cogview4 (#10966)
- remove F.rms_norm for now (#11126)
- add sana-sprint (#11074)
- @DN6
- [CI] Update HF_TOKEN in all workflows (#10613)
- [CI] Fix Truffle Hog failure (#10769)
- [Single File] Add Single File support for Lumina Image 2.0 Transformer (#10781)
- [CI] Fix incorrectly named test module for Hunyuan DiT (#10854)
- [CI] Update always test Pipelines list in Pipeline fetcher (#10856)
- [Docs] Fix toctree sorting (#10894)
- [CI] Improvements to conditional GPU PR tests (#10859)
- [CI] Fix Fast GPU tests on PR (#10912)
- [CI] Fix for failing IP Adapter test in Fast GPU PR tests (#10915)
- [CI] Update Stylebot Permissions (#10931)
- [Single File] Add user agent to SF download requests. (#10979)
- [Single File] Add single file support for Wan T2V/I2V (#10991)
- Fix for fetching variants only (#10646)
- [Quantization] Add Quanto backend (#10756)
- [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 (#11018)
- [Refactor] Clean up import utils boilerplate (#11026)
- Provide option to reduce CPU RAM usage in Group Offload (#11106)
- [Quantization] dtype fix for GGUF + fix BnB tests (#11159)
- [Docs] Update Wan Docs with memory optimizations (#11089)
- [WIP] Add Wan Video2Video (#11053)
- Add CacheMixin to Wan and LTX Transformers (#11187)
- Fix Single File loading for LTX VAE (#11200)
- Update Ruff to latest Version (#10919)
- @Anonym0u3
- Add pipeline_stable_diffusion_xl_attentive_eraser (#10579)
- @lavinal712
- create a script to train autoencoderkl (#10605)
- [BUG] Fix Autoencoderkl train script (#11113)
- @Marlon154
- Add community pipeline for semantic guidance for FLUX (#10610)
- @ParagEkbote
- Fix Documentation about Image-to-Image Pipeline (#10704)
- Notebooks for Community Scripts-6 (#10713)
- Extend Support for callback_on_step_end for AuraFlow and LuminaText2Img Pipelines (#10746)
- Notebooks for Community Scripts-7 (#10846)
- Add Example of IPAdapterScaleCutoffCallback to Docs (#10934)
- Notebooks for Community Scripts-8 (#11128)
- @suzukimain
- [Community] Enhanced Model Search (#10417)
- @staoxiao
- Add OmniGen (#10148)
- @elismasilva
- feat: new community mixture_tiling_sdxl pipeline for SDXL (#10759)
- fix: [Community pipeline] Fix flattened elements on image (#10774)
- feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL (#10951)
- fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings (#11012)
- fix: for checking mandatory and optional pipeline components (#11189)
- feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline (#11188)
- @zhuole1025
- Add support for lumina2 (#10642)
- @zRzRzRzRzRzRzR
- CogView4 (supports different length c and uc) (#10649)
- Update pipeline_cogview4.py (#10944)
- [Docs] CogView4 comment fix (#10957)
- CogView4 Control Block (#10809)
- Modify the implementation of retrieve_timesteps in CogView4-Control. (#11125)
- @toshas
- Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation (#10884)
- @bubbliiiing
- Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model (#10626)
- @LittleNyima
- Add CogVideoX DDIM Inversion to Community Pipelines (#10956)
- @kinam0252
- Add STG to community pipelines (#10960)
- @tolgacangoz
- [Research Project] Add AnyText: Multilingual Visual Text Generation And Editing (#8998)
- Update README and example code for AnyText usage (#11028)
- [LTX0.9.5] Refactor LTXConditionPipeline for text-only conditioning (#11174)
- @Ednaordinary
- Add Wan with STG as a community pipeline (#11184)
- Jan 15, 2025
- Date parsed from source:Jan 15, 2025
- First seen by Releasebot:Mar 20, 2026
v0.32.2
diffusers releases a patch update that fixes Flux single-file checkpoint loading, improves LoRA support for 4bit quantized Flux and Hunyuan Video, adds unload_lora_weights for Flux Control, and resolves a Hunyuan Video batch size bug.
Fixes for Flux Single File loading, LoRA loading for 4bit BnB Flux, Hunyuan Video
This patch release
Fixes a regression in loading Comfy UI format single file checkpoints for Flux
Fixes a regression in loading LoRAs with bitsandbytes 4bit quantized Flux models
Adds unload_lora_weights for Flux Control
Fixes a bug that prevents Hunyuan Video from running with batch size > 1
Allow Hunyuan Video to load LoRAs created from the original repository codeAll commits
- [Single File] Fix loading Flux Dev finetunes with Comfy Prefix by @DN6 in #10545
- [CI] Update HF Token on Fast GPU Model Tests by @DN6 #10570
- [CI] Update HF Token in Fast GPU Tests by @DN6 #10568
- Fix batch > 1 in HunyuanVideo by @hlky in #10548
- Fix HunyuanVideo produces NaN on PyTorch<2.5 by @hlky in #10482
- Fix hunyuan video attention mask dim by @a-r-r-o-w in #10454
- [LoRA] Support original format loras for HunyuanVideo by @a-r-r-o-w in #10376
- [LoRA] feat: support loading loras into 4bit quantized Flux models. by @sayakpaul in #10578
- [LoRA] clean up load_lora_into_text_encoder() and fuse_lora() copied from by @sayakpaul in #10495
- [LoRA] feat: support unload_lora_weights() for Flux Control. by @sayakpaul in #10206
- Fix Flux multiple Lora loading bug by @maxs-kan in #10388
- [LoRA] fix: lora unloading when using expanded Flux LoRAs. by @sayakpaul in #10397
- Dec 25, 2024
- Date parsed from source:Dec 25, 2024
- First seen by Releasebot:Mar 20, 2026
v0.32.1
diffusers fixes TorchAO quantizer bugs, resolving import issues on older PyTorch versions, correcting quantization behavior, and tightening device map handling for better-tested releases.
TorchAO Quantizer fixes
This patch release fixes a few bugs related to the TorchAO Quantizer introduced in v0.32.0.
Importing Diffusers would raise an error in PyTorch versions lower than 2.3.0. This should no longer be a problem.
Device Map does not work as expected when using the quantizer. We now raise an error if it is used. Support for using device maps with different quantization backends will be added in the near future.
Quantization was not performed due to faulty logic. This is now fixed and better tested.
Refer to our documentation to learn more about how to use different quantization backends.
All commits
- make style for #10368 by @yiyixuxu in #10370
- fix test pypi installation in the release workflow by @sayakpaul in #10360
- Fix TorchAO related bugs; revert device_map changes by @a-r-r-o-w in #10371