Hugging Face Release Notes

Name: Hugging Face
Brand: Hugging Face

Last updated: May 1, 2026

Get this feed:

Hugging Face Products

All Hugging Face Release Notes (44)

May 1, 2026
- Date parsed from source:
  May 1, 2026
- First seen by Releasebot:
  May 1, 2026
diffusers by Hugging Face

Diffusers 0.38.0: New image and audio pipelines, Core library improvements, and more

diffusers releases a major expansion with new pipelines for LLaDA2, NucleusMoE, ERNIE-Image, LongCat-AudioDiT, and ACE-Step, plus FLUX.2 decoder and inpaint support, modular pipeline updates, and core performance and attention backend improvements.

New Pipelines

LLaDA2

LLaDA2 is a family of discrete diffusion language models that generate text through block-wise iterative refinement. Instead of autoregressive token-by-token generation, LLaDA2 starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement steps.

PR: #13226

Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/llada2

Nucleus-MoE

NucleusMoE-Image is a 2B active 17B parameter model trained with efficiency at its core. Our novel architecture highlights the scalability of a sparse MoE architecture for Image generation.

PR: #13317

Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/nucleusmoe_image

Thanks to @sippycoder for the contribution.

Ernie-Image

ERNIE-Image is a powerful and highly efficient image generation model with 8B parameters.

PR: #13432

Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/ernie_image

Thanks to @HsiaWinter for the contribution.

LongCat-AudioDiT

LongCat-AudioDiT is a text-to-audio diffusion model from Meituan LongCat.

PR: #13483

Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/longcat_audio_dit

Thanks to @RuixiangMa for the contribution.

Ace-Step 1.5

ACE-Step 1.5 generates variable-length stereo audio at 48 kHz (10 seconds to 10 minutes) from text prompts and optional lyrics. The full system pairs a Language Model planner with a Diffusion Transformer (DiT) synthesizer; this pipeline wraps the DiT half of that stack, and consists of three components: an AutoencoderOobleck VAE that compresses waveforms into 25 Hz stereo latents, a Qwen3-based text encoder for prompt and lyric conditioning, and an AceStepTransformer1DModel DiT that operates in the VAE latent space using flow matching.

PR: #13095

Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/ace_step

Thanks to @ChuxiJ for the contribution.

Flux.2 Small Decoder

Make your Flux.2 decoding faster with this new small decoder model from the Black Forest Labs. You can check it out here. It was contributed by @huemin-art in this PR.

Modular Pipeline Support

We added modular support for LTX-2 and Hunyuan 1.5.

Core Library

Flash Attention 4 backend

FlashPack loading

Group offloading + TorchAO

ring_anything as a new CP backend

Profiling pipelines in Diffusers

All commits

[Discrete Diffusion] Add LLaDA2 pipeline by @kashif in #13226

[LLADA2] documentation fixes by @kashif in #13333

[ci] claude in ci. by @sayakpaul in #13297

[docs] kernels by @stevhliu in #13139

[tests] Tests for conditional pipeline blocks by @sayakpaul in #13247

avoid hardcode device in flux-control example by @kaixuanliu in #13336

fix claude workflow to include id-token with write. by @sayakpaul in #13338

Update LTX-2 Docs to Cover LTX-2.3 Models by @dg845 in #13337

remove str option for quantization config in torchao by @howardzhang-cv in #13291

[ci] include checkout step in claude review workflow by @sayakpaul in #13352

change minimum version guard for torchao to 0.15.0 by @howardzhang-cv in #13355

[ci] move to assert instead of self.Assert* by @sayakpaul in #13366

[docs] refactor model skill by @stevhliu in #13334

Fix Ulysses SP backward with SDPA by @zhtmike in #13328

Add train flux2 series lora config by @tcaimm in #13011

[docs] Add NeMo Automodel training guide by @pthombre in #13306

Fix: ensure consistent dtype and eval mode in pipeline save/load tests by @YangKai0616 in #13339

[ci] support claude reviewing on forks. by @sayakpaul in #13365

Fix MotionConv2d to cast blur_kernel to input dtype instead of reverse by @YangKai0616 in #13364

chore: update claude_review.yml by @hf-security-analysis[bot] in #13374

corrects single file path validation logic by @andrew-w-ross in #13363

[docs] deprecate pipelines by @stevhliu in #13157

🔒 Pin GitHub Actions to commit SHAs by @paulinebm in #13385

[docs] add auto docstring and parameter templates documentation for m… by @yiyixuxu in #13382

Fix typos and grammar errors in documentation by @GalacticAvenger in #13391

fix(ddim): validate eta is in [0, 1] in DDIMPipeline by @NIK-TIGER-BILL in #13367

Fix Dynamo lru_cache warnings during torch.compile by @jiqing-feng in #13384

[tests] refactor wan autoencoder tests by @sayakpaul in #13371

NucleusMoE-Image by @sippycoder in #13317

Add examples on how to profile a pipeline by @sayakpaul in #13356

Update README.md of the profiling guide by @sayakpaul in #13400

[CI] Refactor Cosmos Transformer Tests by @DN6 in #13335

[tests] refactor autoencoderdc tests by @sayakpaul in #13369

[CI] Hunyuan Transformer Tests Refactor by @DN6 in #13342

Fix VAE offload encode device mismatch in DreamBooth scripts by @azolotenkov in #13417

Remove references to torchao's AffineQuantizedTensor by @andrewor14 in #13405

[tests] fix autoencoderdc tests by @sayakpaul in #13424

[core] fix group offloading when using torchao by @sayakpaul in #13276

Fix IndexError in HunyuanVideo I2V pipeline by @kaixuanliu in #13244

improve Claude CI by @yiyixuxu in #13397

FLUX.2 small decoder by @huemin-art in #13428

[CI] Add PR/Issue Auto Labeler by @DN6 in #13380

[CI] Add GLM Image Transformer Model Tests by @DN6 in #13344

[CI] Use finegrained token for Issue Labeler by @DN6 in #13433

Handle prompt embedding concat in Qwen dreambooth example by @chenyangzhu1 in #13387

fix(qwen-image dreambooth): correct prompt embed repeats when using --with_prior_preservation by @chenyangzhu1 in #13396

Cache RoPE freqs on device to avoid repeated CPU-GPU copy in QwenImage by @akshan-main in #13406

[tests] tighten dependency testing. by @sayakpaul in #13332

Fix grammar in LoRA documentation by @Xyc2016 in #13423

Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution i… by @akshan-main in #13440

[modular] Add LTX Video modular pipeline by @akshan-main in #13378

Add ernie image by @HsiaWinter in #13432

[core] fix fa4 integration by @sayakpaul in #13443

FlashPack by @hlky in #12700

[ptxla] fix pytorch xla inference on TPUs. by @entrpn in #13463

fix some dtype issue for gguf / some gpu backends by @HsiaWinter in #13464

Fix Qwen Image DreamBooth prior preservation batch ordering by @azolotenkov in #13441

[tests] fix deprecated attention processor testing. by @sayakpaul in #13469

[tests] xfail clip related issues. by @sayakpaul in #13454

[agent] add modular doc by @yiyixuxu in #13410

[tests] fix training tests by @sayakpaul in #13442

fix(profiling): preserve instance isolation when decorating methods by @Akash504-ai in #13471

[Feat] Adds LongCat-AudioDiT pipeline by @RuixiangMa in #13390

Fix Flux2 DreamBooth prior preservation prompt repeats by @azolotenkov in #13415

chore: bump doc-builder SHA for PR upload workflow by @rtrompier in #13476

Remove compile bottlenecks from ZImage pipeline by @hitchhiker3010 in #13461

[chore] Add diffusers-format example to LongCatAudioDiTPipeline by @RuixiangMa in #13483

[core] fix autoencoderkl qwenimage for xla by @sayakpaul in #13480

add PR fork workable by @paulinebm in #13438

Add modular pipeline for HunyuanVideo 1.5 by @akshan-main in #13389

[agents docs] add float64 gotcha by @yiyixuxu in #13472

fix(ernie-image): avoid locals() comprehension scope issue in callback kwargs by @songh11 in #13478

[Bugfix] Fix shape mismatch in LongCatAudioDiTTransformer conversion by @RuixiangMa in #13494

feat: bump safetensors to 0.8.0-rc.0 by @McPatate in #13470

fix(qwen): fix CFG failing when passing neg prompt embeds with none mask by @Sunhill666 in #13379

add an example of spmd for flux on v5e-8 by @sayakpaul in #13474

Add FLUX.2 Klein Inpaint Pipeline by @adi776borate in #13050

[docs] add a mention of torchao and other backends in speed memory docs. by @sayakpaul in #13499

Fix Flux2 non-diffusers guidance LoRA conversion by @yadferhad in #13486

add _native_npu_attention support mask shape like [B,1,1,S] by @chang-zhijie in #13490

fix(freeu): run FFT in float32 for float16 inputs to avoid ComplexHalf by @Ricardo-M-L in #13503

Fix non-deterministic T5 outputs in HiDream pipeline tests by @kaixuanliu in #13534

Fix AuraFlow attn processors applying norm_added_q to key projection by @Ricardo-M-L in #13533

add _repeated_blocks for ErnieImageTransformer2DModel by @kaixuanliu in #13496

[CI] Fix BnB tests by @DN6 in #13481

[tests] fix group offloading with disk tests by @sayakpaul in #13491

[ci] feat: have pr labeler label for closing issues. by @sayakpaul in #13548

Improve trust_remote_code by @hlky in #13448

chore: bump doc-builder SHA for main doc build workflow by @rtrompier in #13555

[ci] simplify release workflow. by @sayakpaul in #13329

[attention backends] fix ring CP for flash and flash 3 by @sayakpaul in #13182

[agents docs] add pipelines.md etc by @yiyixuxu in #13567

Add Ernie-Image modular pipeline by @akshan-main in #13498

[agents docs] update modular.md by @yiyixuxu in #13568

[docs] fix typo in AutoencoderOobleck docs by @ivnvalex in #13642)

Fix ErnieImagePipeline pre-computed prompt_embeds + num_images_per_prompt shape mismatch by @Ricardo-M-L in #13532

feat: support ring attention with arbitrary KV sequence lengths by @songh11 in #13545

[ci] use tokenizers stable installtion in CI. by @sayakpaul in #13562

NucleusMoE docs by @sayakpaul in #13661

Fix UniPC scheduler device mismatch when using offloading by @ParamChordiya in #13489

[Ernie-Image] Add lora support by @asomoza in #13575

Add ACE-Step pipeline for text-to-music generation by @ChuxiJ in #13095

Fix missing latents_bn_std dtype cast in VAE normalization by @adi776borate in #13299

Release: v0.38.0-release by @sayakpaul (direct commit on v0.38.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@kashif

[Discrete Diffusion] Add LLaDA2 pipeline (#13226)

[LLADA2] documentation fixes (#13333)

@howardzhang-cv

remove str option for quantization config in torchao (#13291)

change minimum version guard for torchao to 0.15.0 (#13355)

@sippycoder

NucleusMoE-Image (#13317)

@DN6

[CI] Refactor Cosmos Transformer Tests (#13335)

[CI] Hunyuan Transformer Tests Refactor (#13342)

[CI] Add PR/Issue Auto Labeler (#13380)

[CI] Add GLM Image Transformer Model Tests (#13344)

[CI] Use finegrained token for Issue Labeler (#13433)

[CI] Fix BnB tests (#13481)

@akshan-main

Cache RoPE freqs on device to avoid repeated CPU-GPU copy in QwenImage (#13406)

Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution i… (#13440)

[modular] Add LTX Video modular pipeline (#13378)

Add modular pipeline for HunyuanVideo 1.5 (#13389)

Add Ernie-Image modular pipeline (#13498)

@HsiaWinter

Add ernie image (#13432)

fix some dtype issue for gguf / some gpu backends (#13464)

@hlky

FlashPack (#12700)

Improve trust_remote_code (#13448)

@RuixiangMa

[Feat] Adds LongCat-AudioDiT pipeline (#13390)

[chore] Add diffusers-format example to LongCatAudioDiTPipeline (#13483)

[Bugfix] Fix shape mismatch in LongCatAudioDiTTransformer conversion (#13494)

@adi776borate

Add FLUX.2 Klein Inpaint Pipeline (#13050)

Fix missing latents_bn_std dtype cast in VAE normalization (#13299)

@ChuxiJ

Add ACE-Step pipeline for text-to-music generation (#13095)
Original source
Apr 28, 2026
- Date parsed from source:
  Apr 28, 2026
- First seen by Releasebot:
  Apr 29, 2026
transformers by Hugging Face

Release v5.7.0

transformers releases v5.7.0 with new Laguna and DEIMv2 model support, plus broad attention, tokenization, generation, and kernel fixes. The update also improves continuous batching, long-sequence memory handling, and kernel loading for a smoother, more reliable experience.

Release v5.7.0

New Model additions

Laguna

Laguna is Poolside's mixture-of-experts language model family that extends standard SwiGLU MoE transformers with two key innovations. It features per-layer head counts allowing different decoder layers to have different query-head counts while sharing the same KV cache shape, and implements a sigmoid MoE router with auxiliary-loss-free load balancing that uses element-wise sigmoid of gate logits plus learned per-expert bias for router scoring.

Links: Documentation

Laguna XS.2 implementation (#45673) by @joerowell in #45673

DEIMv2

DEIMv2 (DETR with Improved Matching v2) is a real-time object detection model that extends DEIM with DINOv3 features and spans eight model sizes from X to Atto for diverse deployment scenarios. It uses a Spatial Tuning Adapter (STA) for larger variants to convert DINOv3's single-scale output into multi-scale features, while ultra-lightweight models employ pruned HGNetv2 backbones. The unified design achieves superior performance-cost trade-offs, with DEIMv2-X reaching 57.8 AP with only 50.3M parameters and DEIMv2-S being the first sub-10M model to exceed 50 AP on COCO.

Links: Documentation | Paper

model: Add DEIMv2 to Transformers (#44339) by @harshaljanjani in #44339

Attention

Several attention-related bugs were fixed across multiple models, including a cross-attention cache type error in T5Gemma2 for long inputs, incorrect cached forward behavior in Qwen3.5's gated-delta-net linear attention, and a crash in GraniteMoeHybrid when no Mamba layers are present. Attention function dispatch was also updated to align with the latest model implementations.

Fix cross-attention cache layer type for T5Gemma2 long inputs (#45540) by @Beichen-Ma in [#45540]

[Qwen3.5] Fix GDN linear attention multi-token cached forward (#45513) by @kashif in [#45513]

Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models (#45514) by @tianhaocui in [#45514]

Align latest model attention function dispatch (#45598) by @Cyrilvallez in [#45598]

Tokenizers

There was a bug in AutoTokenizer that caused the wrong tokenizer class to be initialized. This caused regressions in models like DeepSeek R1.

change got reverted (#45680) by @itazap in [#45680]

Generation

Continuous batching generation received several fixes and improvements, including correcting KV deduplication and memory estimation for long sequences (16K+), and removing misleading warnings about num_return_sequences and other unsupported features that were incorrectly firing even when functionality worked correctly. Documentation for per-request sampling parameters was also added.

generate: drop stale num_return_sequences warning on continuous batching path (#45582) by @joaquinhuigomez in [#45582]

Remove unnecessary generate warnings (#45619) by @Cyrilvallez in [#45619]

[CB] Changes for long generation (#45530) by @remi-or in [#45530]

[docs] per-request sampling params (#45553) by @stevhliu in [#45553]

Kernels

Improved kernel support by fixing configuration reading and error handling for FP8 checkpoints (e.g., Qwen3.5-35B-A3B-FP8), enabling custom expert kernels registered from the HF Hub to be properly loaded, and resolving an incompatibility that prevented Gemma3n and Gemma4 from using the rotary kernel.

Fix configuration reading and error handling for kernels (#45610) by @hmellor in [#45610]

Allow for registered experts from kernels hub (#45577) by @winglian in [#45577]

Gemma3n and Gemma4 cannot use rotary kernel (#45564) by @Cyrilvallez in [#45564]

Bugfixes and improvements

fixing more typos (#45689) by @vasqu in [#45689]

[docs] cb memory management (#45587) by @stevhliu in [#45587]

[docs] cpu offloading (#45660) by @stevhliu in [#45660]

docs(README_zh-hans): clarify conditions for not using Transformers (#45688) by @GuaiZai233 in [#45688]

fix padding side issue for fast_vlm tests (#45592) by @kaixuanliu in [#45592]

Fix x_clip: 8 failed test cases (#45394) by @kaixuanliu in [#45394]

zero_shot_object_detection ValueError fix for python 3.13 (#45669) by @AnkitAhlawat7742 in [#45669]

Fix pageable H2D copies in Gated DeltaNet PyTorch fallback (#45665) by @ruixiang63 in [#45665]

Fix UnboundLocalError in shard_and_distribute_module for replicated parameters (#45675) by @Abdennacer-Badaoui in [#45675]

[MistralCommonBackend] Soften validation mode and apply_chat_template arguments check (#45628) by @juliendenize in [#45628]

Fix NameError: PeftConfigLike triggered by PreTrainedModel.init_subclass (#45658) by @qgallouedec in [#45658]

chore(typing): added modeling_utils to ty (#45425) by @tarekziade in [#45425]

[gemma4] infer from config instead of hardcoding (#45606) by @eustlb in [#45606]

Update quants tests (#45480) by @SunMarc in [#45480]

🔴🔴🔴 fix: skip clean_up_tokenization for BPE tokenizers in PreTrainedTokenizerFast (#44915) by @maxsloef-goodfire in [#44915]

Fix colmodernvbert tests (#45652) by @Cyrilvallez in [#45652]

[CB] [Major] Add CPU request offloading (#45184) by @remi-or in [#45184]

Fix peft constructors (#45622) by @Cyrilvallez in [#45622]

chore: speedup modular converter (~30%) (#45046) by @tarekziade in [#45046]

Fix whisper return language (#42227) by @FredHaa in [#42227]

Add supports_gradient_checkpointing to NemotronHPreTrainedModel (#45625) by @sergiopaniego in [#45625]

Raise clear error for problem_type="single_label_classification" with num_labels=1 (#45611) by @gaurav0107 in [#45611]

CircleCI with torch 2.11 (#45633) by @ydshieh in [#45633]

chore: bump doc-builder SHA for main doc build workflow (#45631) by @rtrompier in [#45631]

Allow more artifacts to be download in CI (#45629) by @ydshieh in [#45629]

chore(qa): split pipeline and add type checking (#45432) by @tarekziade in [#45432]

Skip failing offloading tests (#45624) by @Cyrilvallez in [#45624]

fix: compute auxiliary losses when denoising is disabled in D-FINE (#45601) by @Abineshabee in [#45601]

qa: bumped mlinter and allow local override (#45585) by @tarekziade in [#45585]

Processing Utils: continue when content is a string (#45605) by @RyanMullins in [#45605]

SonicMoe (#45433) by @IlyasMoutawwakil in [#45433]

fix transformers + torchao nvfp4 serialization (#45573) by @vkuzo in [#45573]

[AMD CI] Fix expectations for Gemma3n (#45602) by @Abdennacer-Badaoui in [#45602]

[docs] multi-turn tool calling (#45554) by @stevhliu in [#45554]

Fix AttributeError on s_aux=None in flash_attention_forward (#45589) by @jamesbraza in [#45589]

do not index past decoded chars with special tokens (#45435) by @itazap in [#45435]

Update dev version (#45583) by @vasqu in [#45583]

Update torchao usage for XPU and CPU (#45560) by @jiqing-feng in [#45560]

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@vasqu

fixing more typos (#45689)

Update dev version (#45583)

@joerowell

Laguna XS.2 implementation (#45673)

@tarekziade

chore(typing): added modeling_utils to ty (#45425)

chore: speedup modular converter (~30%) (#45046)

chore(qa): split pipeline and add type checking (#45432)

qa: bumped mlinter and allow local override (#45585)

@harshaljanjani

model: Add DEIMv2 to Transformers (#44339)

@remi-or

[CB] [Major] Add CPU request offloading (#45184)

[CB] Changes for long generation (#45530)
Original source
All of your release notes in one feed

Join Releasebot and get updates from Hugging Face and hundreds of other software products.

Create account
Get updates with:
Apr 23, 2026
- Date parsed from source:
  Apr 23, 2026
- First seen by Releasebot:
  Apr 24, 2026
transformers by Hugging Face

Patch release v5.6.2

transformers fixes FP8 support for Qwen 3.5 and 3.6 MoE text models and improves kernel config reading and error handling.

Patch release v5.6.2

Qwen 3.5 and 3.6 MoE (text-only) were broken when using with FP8. It should now work again with this 🫡

Fix configuration reading and error handling for kernels (#45610) by @hmellor

Full Changelog: v5.6.1...v5.6.2
Original source
Apr 23, 2026
- Date parsed from source:
  Apr 23, 2026
- First seen by Releasebot:
  Apr 23, 2026
transformers by Hugging Face

Patch release v5.6.1

transformers fixes broken flash attention path and s_aux=None AttributeError in patch release v5.6.1.

Patch release v5.6.1

Flash attention path was broken! Sorry everyone for this one 🤗

Fix AttributeError on s_aux=None in flash_attention_forward (#45589) by @jamesbraza
Original source
Apr 22, 2026
- Date parsed from source:
  Apr 22, 2026
- First seen by Releasebot:
  Apr 22, 2026
transformers by Hugging Face

Release v5.6.0

transformers releases v5.6.0 with new Privacy Filter, Qianfan-OCR, SAM3-LiteText, and SLANet model support, plus a richer serve experience for legacy completions and multimodal audio and video. It also brings broad vision, audio, parallelism, and tokenizer fixes.

Release v5.6.0

New Model additions
OpenAI Privacy Filter
OpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable. The model labels an input sequence in a single forward pass, then decodes coherent spans with a constrained Viterbi procedure, predicting probability distributions over 8 privacy-related output categories for each input token.

Links: Documentation

[Privacy Filter] Add model (#45580) by @vasqu in #45580
QianfanOCR
Qianfan-OCR is a 4B-parameter end-to-end document intelligence model developed by Baidu that performs direct image-to-text conversion without traditional multi-stage OCR pipelines. It supports a broad range of prompt-driven tasks including structured document parsing, table extraction, chart understanding, document question answering, and key information extraction all within one unified model. The model features a unique "Layout-as-Thought" capability that generates structured layout representations before producing final outputs, making it particularly effective for complex documents with mixed element types.

Links: Documentation | Paper

add Qianfan-OCR model definition (#45280) by @marvinzh in #45280
SAM3-LiteText
SAM3-LiteText is a lightweight variant of SAM3 that replaces the heavy SAM3 text encoder (353M parameters) with a compact MobileCLIP-based text encoder optimized through knowledge distillation, while keeping the SAM3 ViT-H image encoder intact. This reduces text encoder parameters by up to 88% while maintaining segmentation performance comparable to the original model. The model enables efficient vision-language segmentation by addressing the redundancy found in text prompting for segmentation tasks.

Links: Documentation | Paper

Add SAM3-LiteText (#44320) by @NielsRogge in #44320
SLANet
SLANet and SLANet_plus are lightweight models designed for table structure recognition, focusing on accurately recognizing table structures in documents and natural scenes. The model improves accuracy and inference speed by adopting a CPU-friendly lightweight backbone network PP-LCNet, a high-low-level feature fusion module CSP-PAN, and a feature decoding module SLA Head that aligns structural and positional information. SLANet was developed by Baidu PaddlePaddle Vision Team as part of their table structure recognition solutions.

Links: Documentation

[Model] Add SLANet Model Support (#45532) by @zhang-prog in #45532

Breaking changes

The internal rotary_fn is no longer registered as a hidden kernel function, so any code referencing self.rotary_fn(...) within an Attention module will break and must be updated to call the function directly instead.

🚨 [Kernels] Fix kernel function registration (#45420) by @vasqu

Serve

The transformers serve command received several enhancements, including a new /v1/completions endpoint for legacy text completion, multimodal support for audio and video inputs, improved tool-calling via parse_response, proper forwarding of tool_calls/tool_call_id fields, a 400 error on model mismatch when the server is pinned to a specific model, and fixes for the response API. Documentation was also updated to cover new serving options such as --compile and --model-timeout.

Add /v1/completions endpoint (OpenAI legacy completions API) to transformers serve (#44558) by @rain-1 in [#44558]

Updated the image cache for Paddle models according to the latest API (#45562) by @zhang-prog in [#45562]

Raise 400 on model mismatch when transformers serve is pinned (#45443) by @qgallouedec in [#45443]

[serve] Update tool call to switch to parse_response (#45485) by @SunMarc in [#45485]

Fix response api support (#45463) by @SunMarc in [#45463]

[serve] Forward tool_calls/tool_call_id in processor inputs (#45418) by @qgallouedec in [#45418]

refactor(qa): extend extras so ty can run on server modules (#45456) by @tarekziade in [#45456]

Multimodal serve support (#45220) by @SunMarc in [#45220]

[docs] transformers serve (#45174) by @stevhliu in [#45174]

Vision

Several vision-related bug fixes were applied in this release, including correcting Qwen2.5-VL temporal RoPE scaling for still images, fixing missing/mismatched image processor backends for Emu3 and BLIP, resolving modular image processor class duplication, and preventing accelerate from incorrectly splitting vision encoders in PeVideo/PeAudioVideo models. Image loading performance was also improved by leveraging torchvision's native decode_image in the torchvision backend, yielding up to ~17% speedup over PIL-based loading.

Revert "Fix: modular image processors (#45492)" (#45531) by @tarekziade in [#45531]

Fix: modular image processors (#45492) by @zucchini-nlp in [#45492]

fix: prevent accelerate from splitting vision encoder by setting no… (#43047) by @ in [#43047]

Fix Qwen2.5-VL temporal RoPE scaling applied to still images (#45330) by @Kash6 in [#45330]

Use torchvision decode_image to load images in the torchvision backend (#45195) by @yonigozlan in [#45195]

Fix missing image processors backends (#45165) by @zucchini-nlp in [#45165]

Parallelization

Fixed several bugs affecting distributed training, including silently wrong results or NaN loss with Expert Parallelism, NaN weights on non-rank-0 FSDP processes, and a resize failure in PP-DocLayoutV3; additionally added support for loading adapters with Tensor Parallelism, added MoE to the Gemma4 TP plan, and published documentation for TP training.

Fix EP: RouterParallel shape, tp_plan property, grouped_mm sentinels (#45473) by @AmineDiro in [#45473]

Fix NaN weights on non-rank-0 FSDP processes (#45050) by @albertvillanova in [#45050]

Load adapter with TP (#45155) by @michaelbenayoun in [#45155]

[docs] tp training (#44613) by @stevhliu in [#44613]

Fix resize failure caused by zero-sized masks in PP-DocLayoutV3 (#45281) by @zhang-prog in [#45281]

Add MoE to Gemma4 TP plan (#45219) by @sywangyi in [#45219]

Tokenization

Fixed a docstring typo in streamer classes, resolved a Kimi-K2.5 tokenizer regression and _patch_mistral_regex AttributeError, and patched a streaming generation crash for Qwen3VLProcessor caused by incorrect _tokenizer attribute access. Additional housekeeping included moving the GPT-SW3 instruct tokenizer to an internal testing repo and fixing a global state leak in the tokenizer registry during tests.

[Doc] Fix 'tokenized' -> 'tokenizer' typo in streamer docstrings (#45508) by @avasis-ai in [#45508]

Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex AttributeError (#45359) by @ArthurZucker in [#45359]

fix(serving): resolve rust tokenizer from ProcessorMixin in streaming generation (#45368) by @sharziki in [#45368]

[Tokenizers] Move gpt sw3 tokenizer out (#45404) by @vasqu in [#45404]

fix: leak in tokenizer registry for test_processors (#45318) by @tarekziade in [#45318]

Cache

Cache handling was improved for Gemma4 and Gemma3n models by dissociating KV state sharing from the Cache class, ensuring KV states are always shared regardless of whether a Cache is used. Additionally, the image cache for Paddle models was updated to align with the latest API.

Align gemma3n cache sharing to gemma4 (#45489) by @Cyrilvallez in [#45489]

remove cache file from tree (#45392) by @tarekziade in [#45392]

[gemma4] Dissociate kv states sharing from the Cache (#45312) by @Cyrilvallez in [#45312]

Audio

Audio models gained vLLM compatibility through targeted fixes across several model implementations, while reliability improvements were also made including exponential back-off retries for audio file downloads, a crash fix in the text-to-speech pipeline when generation configs contain None values, and corrected test failures for Kyutai Speech-To-Text.

feat[vLLM × v5]: Add vLLM compatibility for audio models (#45326) by @harshaljanjani in [#45326]

http retries on audio file downloads (#45126) by @tarekziade in [#45126]

fix(testing): Fix Kyutai Speech-To-Text and LongCatFlash test failures on main CI (#44695) by @harshaljanjani in [#44695]

Fix text-to-speech pipeline crash when generation config contains None values (#45107) by @jiqing-feng in [#45107]

Bugfixes and improvements

[Privacy Filter] Add model (#45580) by @vasqu in [#45580]

Add ForSequenceClassification heads for the OLMo family (#45551) by @earino in [#45551]

Add IndexCache support for GLM5 DSA (#45424) by @louzongzhi in [#45424]

Fix redundant logic in video processing SmolVLM (#45272) by @yonigozlan in [#45272]

Fix typos (#45574) by @vasqu in [#45574]

[Model] Add SLANet Model Support (#45532) by @zhang-prog in [#45532]

refactor(Dots1): drop Dots1MoE override to pass (inherits from DSV3 MoE) (#45572) by @casinca in [#45572]

perf: avoid recomputing rotary_emb for each layer in some Google and ModernBERT models (#45555) by @casinca in [#45555]

Gemma4 training with text-only samples (#45454) by @zucchini-nlp in [#45454]

[nemotron_h] Add support for MLP mixers (#44763) by @xenova in [#44763]

add expert parallelism for gemma-4-26B-A4B-it (#45279) by @sywangyi in [#45279]

Add full GGUF loading support for GPT‑OSS (fixes #43366, supersedes #43757) latest (#45506) by @sirzechs66 in [#45506]

Update Gemma4 weight conversion script (#45328) by @RyanMullins in [#45328]

Move some conversion mappings to PrefixChange (#45567) by @Cyrilvallez in [#45567]

fix table update versions (#45544) by @tarekziade in [#45544]

Add disable_mmap kwarg to from_pretrained with hf-mount auto-detection (#45547) by @rtrompier in [#45547]

fix(DSV3): parity between native DeepseekV3MoE and remote official implementation (#45441) by @casinca in [#45441]

[modular] Fix modular logic broken in #45045 (#45539) by @Cyrilvallez in [#45539]

Fix: propagate quantization_config to text sub-config for composite models in AutoModelForCausalLM (#45494) by @lvliang-intel in [#45494]

T5Gemma2: fix prepare_decoder_input_ids_from_labels (#45516) by @Tokarak in [#45516]

[Trainer] Add ddp_static_graph option (#45519) by @KeitaW in [#45519]

Add dtype config options for Four Over Six (#45367) by @jackcook in [#45367]

[Sam3LiteText] Remove unnecessary modules/configs (#45535) by @yonigozlan in [#45535]

Fix conditional check for float formatting (#44425) by @qgallouedec in [#44425]

Fix AMD CI: rebuild torchvision with libjpeg + refresh expectations (#45533) by @Abdennacer-Badaoui in [#45533]

Reapply modular to examples (#45527) by @Cyrilvallez in [#45527]

qa: re-run modular converter when the script itself is modified (#45528) by @tarekziade in [#45528]

[GGUF] Reduce peak RAM usage by casting dequantized tensors early during load (#45386) by @UsamaKenway in [#45386]

Fix CSM TextToAudioPipeline missing <bos> token (#45525) by @jiqing-feng in [#45525]

[Conversion Mapping] Small fixups (#45483) by @vasqu in [#45483]

fix: return empty tuple from import_protobuf_decode_error when protobuf is unavailable (#45486) by @jw9603 in [#45486]

throw error when conversion required (#45078) by @itazap in [#45078]

chore: bump doc-builder SHA for PR upload workflow (#45450) by @rtrompier in [#45450]

xpu output align with cuda in test case (#45526) by @sywangyi in [#45526]

chore(qa): split out mlinter (#45475) by @tarekziade in [#45475]

[loading] Clean way to add/remove full parts in checkpoint names (#45448) by @Cyrilvallez in [#45448]

Fix Zamba2MambaMixer ignoring use_mamba_kernels=False (#44853) by @sergiopaniego in [#44853]

revert sha commit pointing to main for transformers_amd_ci_ workflows (#45495) by @paulinebm in [#45495]

Fix ZeRO-3 from_pretrained: load registered buffers in _load_state_dict_into_zero3_model (#45402) by @saslifat-gif in [#45402]

Remove redundant condition checks in get_image_size method (#45461) by @JiauZhang in [#45461]

Add check-auto in repo-consistency and fix sorting (#45481) by @zucchini-nlp in [#45481]

Fix typos in src/transformers/utils/output_capturing.py (#45269) by @ryota-komatsu in [#45269]

typing: rule 15 - checks for tie_word_embeddings presence (#44988) by @tarekziade in [#44988]

[CB] Fix capture of max_seqlen (#45323) by @remi-or in [#45323]

Minor update (#45484) by @ydshieh in [#45484]

Add Neuron to auto-compile hardware list (#44757) by @dacorvo in [#44757]

Allow loading Qwen Thinker 'base' models without generative head (#45457) by @tomaarsen in [#45457]

[fix] Always early return for non-Mistral models in _patch_mistral_regex (#45444) by @tomaarsen in [#45444]

Fix spurious position_ids warnings for at least 40 architectures (#45437) by @tomaarsen in [#45437]

[fix] Make Qwen2_5OmniProcessor warning a lot less noisy via warning_once (#45455) by @tomaarsen in [#45455]

Dynamic auto mapping (#45018) by @zucchini-nlp in [#45018]

[docs] vlm addition (#45271) by @stevhliu in [#45271]

fix: dont download artifacts from the test hub (#45319) by @tarekziade in [#45319]

fix(clipseg): fix 2 failing tests (#45403) by @kaixuanliu in [#45403]

[docs] @auto_docstring decorator (#45130) by @stevhliu in [#45130]

Fix Sam3Processor missing input_boxes_labels for padded None entries (#45171) by @Kash6 in [#45171]

better grad acc tests (#45434) by @SunMarc in [#45434]

Add example for iterative chatting with MLLMs (#45398) by @zucchini-nlp in [#45398]

Gemma4 resizing per layer inputs (#45324) by @zucchini-nlp in [#45324]

Add step3_vl to MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS (#45449) by @hmellor in [#45449]

Update workflow references to new commit hash (#45442) by @paulinebm in [#45442]

[Gemma4] Add docstrings for Per-Layer Embeddings (PLE) pipeline (#45207) by @w4nderlust in [#45207]

[Doc] Correct checkpoint path in Dinov2 model_docs (#45430) by @ambroiseodt in [#45430]

Fix ty for transformers cli (#45190) by @SunMarc in [#45190]

fix(models): Resolve regressions in Wav2Vec2PhonemeCTCTokenizer (wav2vec2-lv-60-espeak-cv-ft) (#45199) by @harshaljanjani in [#45199]

Fix Qwen2.5VL temporal grid positions (#45400) by @zucchini-nlp in [#45400]

[fix] PEFT integration fixes preventing save/load & integration (#45428) by @tomaarsen in [#45428]

Fix the response schema for the gemma4 converter (#45411) by @Rocketknight1 in [#45411]

[Doc] MoE routing capture and replay recipe (#44925) by @kashif in [#44925]

Fix apply_chat_template crash on tool_call messages without content (#45348) by @qgallouedec in [#45348]

[AMD CI] Fix torch.compile/export failures on AMD CI due to untraceable set.contains (#45282) by @Abdennacer-Badaoui in [#45282]

[inference_fusion] convert conv3d patch embed to linear (#45041) by @JJJYmmm in [#45041]

Fix #45305 + add regression test GAS (#45349) by @florian6973 in [#45349]

Update trackio integration to use Buckets and "freeze" Space after training (#45329) by @abidlabs in [#45329]

fix(qwen3_moe): correct return type annotation on Qwen3MoeSparseMoeBlock.forward (#45352) by @RudrenduPaul in [#45352]

Fix: NotebookProgressCallback crash when evaluating with the Trainer (#44949) by @Charly21r in [#44949]

docs: fix 5 docstring errors in Gemma3nTextConfig (typos, grammar, formatting) (#45370) by @RudrenduPaul in [#45370]

Less unnecessary RoPE warnings (#45289) by @zucchini-nlp in [#45289]

Fix unintended Hub metadata calls from _patch_mistral_regex (#43603) by @vaibhav-research in [#43603]

Fix MoE routers returning probabilities instead of logits (#45131) by @yacinemebarki in [#45131]

[docs] training on specific hardware (#44799) by @stevhliu in [#44799]

[docs] zero + sequence parallelism (#44605) by @stevhliu in [#44605]

Fix vlm weight mappings (#45358) by @Cyrilvallez in [#45358]

Copy the template resolution logic from the base apply_chat_template to Voxtral (#45117) by @Rocketknight1 in [#45117]

add kwargs to all methods in the CallbackHandler class (#45353) by @wilnn in [#45353]

Close file handler (#45187) by @ydshieh in [#45187]

fix: restore mypy type checking for PreTrainedConfig subclasses (#45071) (#45240) by @shhKnight30 in [#45240]

cohere_asr: fix device issue for test_model_parallel_beam_search (#45214) by @kaixuanliu in [#45214]

Fix AttributeError in Gemma3ForConditionalGeneration and Gemma3ForSequenceClassification when config.return_dict=False (#45277) by @kamalrajkannan78 in [#45277]

fix bug for videomt model device mismatch (#45204) by @kaixuanliu in [#45204]

fix gemma4 gradient accumulation loss and last token incorrect labels (#45354) by @winglian in [#45354]

Logger has [transformers] prefix in non-verbose mode (#45316) by @zucchini-nlp in [#45316]

Fix AttributeError in AssistantToTargetTranslator.unmap_input_ids with cross-vocab models (#45320) by @Regata3010 in [#45320]

musicflamingo: add test support for Intel XPU device (#45212) by @kaixuanliu in [#45212]

nomic_bert: make the test suitable for general device. (#45209) by @kaixuanliu in [#45209]

Skip invalid flash-attn tests for pi0 model (#45011) by @kaixuanliu in [#45011]

Add cuda compatibility check for using grouped_mm (#45001) by @Sai-Suraj-27 in [#45001]

[docs] optimizers, hyperparam search, training features (#44290) by @stevhliu in [#44290]

Remove unused parameters and improve add_tensor_parallel_hooks_t… (#44768) by @michaelbenayoun in [#44768]

[gemma4] Fix device map auto (#45347) by @Cyrilvallez in [#45347]

Refactor CLIP-like models (#44431) by @zucchini-nlp in [#44431]

refactor: display test duration (#45344) by @tarekziade in [#45344]

Fix Wav2Vec2Config.vocab_size type to allow None (#45108) by @jiqing-feng in [#45108]

Add THD support in ESM (#44145) by @balvisio in [#44145]

[gemma4] Remove all shared weights, and silently skip them during loading (#45336) by @Cyrilvallez in [#45336]

Fix conversion mappings for vlms (#45340) by @Cyrilvallez in [#45340]

chore: added circleci python script to ruff and ty checkers (#45339) by @tarekziade in [#45339]

tweak checkers output on errors (#45163) by @tarekziade in [#45163]

chore: remove test_hub for now (#45337) by @tarekziade in [#45337]

[docs] pipeline cleanup (#44954) by @stevhliu in [#44954]

Fix export for gemma4 and add Integration tests (#45285) by @Cyrilvallez in [#45285]

Fix vllm cis (#45139) by @ArthurZucker in [#45139]

[docs] static model rules (#45232) by @stevhliu in [#45232]

fix(security): prevent untrusted users from triggering TRL CI dispatch (#45302) by @jagwar in [#45302]

[AMD CI] Fix Qwen2 expectations (#45284) by @Abdennacer-Badaoui in [#45284]

Add hasattr(torch.backends.cudnn, "conv") to conftest.py (#45263) by @ydshieh in [#45263]

Fix SmolVLM video processor resize using wrong interpolation after backend refactor (#45258) by @ydshieh in [#45258]

Fix Qwen2IntegrationTest (#45268) by @ydshieh in [#45268]

doc: fix TokenizersBackend.convert_to_native_format docstring (#45262) by @lowzhao in [#45262]

empty (#45261) by @ydshieh in [#45261]

Fix unexpected TF32 being enabled in testing (#45252) by @ydshieh in [#45252]

Fix tf32 issue: set torch.backends.cudnn.conv.fp32_precision explicitly. (#45248) by @ydshieh in [#45248]

Nvidia CI with torch 2.11 (#45243) by @ydshieh in [#45243]

Update tiny model creation script (#45241) by @ydshieh in [#45241]

Update get_test_info.py (related to tiny model creation) (#45238) by @ydshieh in [#45238]

More fix for tiny model creation (#45228) by @ydshieh in [#45228]

remove unnecessary entries in some auto model mappings (#45224) by @ydshieh in [#45224]

fix: hf-doc-builder insallation was failing (#45225) by @tarekziade in [#45225]

[CB] Add per-request logits processors (#45026) by @remi-or in [#45026]

[docs] formatting (#45196) by @stevhliu in [#45196]

fix test_register_result_handler (#45188) by @SunMarc in [#45188]

[CB] Tweaks to update and minor fixes (#45179) by @remi-or in [#45179]

Fix pypi release (#45210) by @ArthurZucker in [#45210]

fix(docs): correct gemma4 docs and examples (#45197) by @douglas-reid in [#45197]

Add Turkish (tr) translation for Get Started section (#45158) by @onwp in [#45158]

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@vasqu

[Privacy Filter] Add model (#45580)

Fix typos (#45574)

[Conversion Mapping] Small fixups (#45483)

🚨 [Kernels] Fix kernel function registration (#45420)

[Tokenizers] Move gpt sw3 tokenizer out (#45404)

@rain-1

Add /v1/completions endpoint (OpenAI legacy completions API) to transformers serve (#44558)

@zhang-prog

Updated the image cache for Paddle models according to the latest API (#45562)

[Model] Add SLANet Model Support (#45532)

Fix resize failure caused by zero-sized masks in PP-DocLayoutV3 (#45281)

@tarekziade

fix table update versions (#45544)

qa: re-run modular converter when the script itself is modified (#45528)

Revert "Fix: modular image processors (#45492)" (#45531)

chore(qa): split out mlinter (#45475)

typing: rule 15 - checks for tie_word_embeddings presence (#44988)

fix: dont download artifacts from the test hub (#45319)

refactor(qa): extend extras so ty can run on server modules (#45456)

remove cache file from tree (#45392)

refactor: display test duration (#45344)

http retries on audio file downloads (#45126)

chore: added circleci python script to ruff and ty checkers (#45339)

tweak checkers output on errors (#45163)

fix: leak in tokenizer registry for test_processors (#45318)

chore: remove test_hub for now (#45337)

fix: hf-doc-builder insallation was failing (#45225)

@marvinzh

add Qianfan-OCR model definition (#45280)

@remi-or

[CB] Fix capture of max_seqlen (#45323)

[CB] Add per-request logits processors (#45026)

[CB] Tweaks to update and minor fixes (#45179)

@ydshieh

Minor update (#45484)

Close file handler (#45187)

Add hasattr(torch.backends.cudnn, "conv") to conftest.py (#45263)

Fix SmolVLM video processor resize using wrong interpolation after backend refactor (#45258)

Fix Qwen2IntegrationTest (#45268)

empty (#45261)

Fix unexpected TF32 being enabled in testing (#45252)

Fix tf32 issue: set torch.backends.cudnn.conv.fp32_precision explicitly. (#45248)

Nvidia CI with torch 2.11 (#45243)

Update tiny model creation script (#45241)

Update get_test_info.py (related to tiny model creation) (#45238)

More fix for tiny model creation (#45228)

remove unnecessary entries in some auto model mappings (#45224)

@NielsRogge

Add SAM3-LiteText (#44320)

@ArthurZucker

Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active (#45414)

Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex AttributeError (#45359)

Fix vllm cis (#45139)

Fix pypi release (#45210)

update to dev version 5.6.0-dev0

@JJJYmmm

[inference_fusion] convert conv3d patch embed to linear (#45041)

@balvisio

Add THD support in ESM (#44145)

@onwp

Add Turkish (tr) translation for Get Started section (#45158)
Original source
Apr 17, 2026
- Date parsed from source:
  Apr 17, 2026
- First seen by Releasebot:
  Apr 17, 2026
- Modified by Releasebot:
  Apr 18, 2026
Hugging Face

Apr 17, 26

Hugging Face adds an /agents.md endpoint for Gradio Spaces so AI coding agents can read and call apps directly.

Every Gradio Space now auto-serves an /agents.md endpoint, a machine-readable API description that AI agents can read and call directly. Point your coding agents (like Claude Code, Codex, or Pi) at it and they figure out how to use the Space without any setup. Read the documentation to learn more.
Original source
Apr 15, 2026
- Date parsed from source:
  Apr 15, 2026
- First seen by Releasebot:
  Apr 16, 2026
Hugging Face

Introducing Kernels

Hugging Face adds Kernels Hub browsing and loading for precompiled optimized kernels with major PyTorch speed-ups.

Kernels repositories provide an easy way to use kernels: precompiled, optimized for your exact hardware and PyTorch version, ready for torch.compile, and yields 1.7–2.5× speed-ups over baseline PyTorch.

You can now browse and load Kernels from the Kernels Hub.
Original source
Apr 13, 2026
- Date parsed from source:
  Apr 13, 2026
- First seen by Releasebot:
  Apr 14, 2026
transformers by Hugging Face

Patch release v5.5.4

transformers releases patch v5.5.4 with urgent tokenizer and training fixes, plus Qwen2.5-VL RoPE corrections.

Patch release v5.5.4

This is mostly some fixes that are good to have asap, mostly for tokenizers;

** Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex Attribute… (#45305) by ArthurZucker

For training:

** Fix #45305 + add regression test GAS (#45349) by florian6973, SunMarc

** Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active (#…) by ArthurZucker

And for Qwen2.5-VL :

** Fix Qwen2.5-VL temporal RoPE scaling applied to still images (#45330) by Kash6, zucchini-nlp
Original source
Apr 10, 2026
- Date parsed from source:
  Apr 10, 2026
- First seen by Releasebot:
  Apr 11, 2026
Hugging Face

ZeroGPU overquota

Hugging Face lets PRO users keep using ZeroGPU Spaces beyond quota with prepaid credits at $1 per 10 minutes.

PRO users can now continue using ZeroGPU Spaces above their daily included quota.

Over-quota usage requires purchasing pre-paid credits. The price is $1 per 10 minutes of over-quota ZeroGPU usage.
Original source
Apr 9, 2026
- Date parsed from source:
  Apr 9, 2026
- First seen by Releasebot:
  Apr 10, 2026
transformers by Hugging Face

Patch release: v5.5.3

transformers fixes Gemma4 device_map auto support in a small patch release.

Small patch release to fix device_map support for Gemma4! It contains the following commit:

[gemma4] Fix device map auto (#45347) by @Cyrilvallez
Original source
Apr 9, 2026
- Date parsed from source:
  Apr 9, 2026
- First seen by Releasebot:
  Apr 10, 2026
transformers by Hugging Face

Patch release: v5.5.2

transformers fixes Gemma4 inference with use_cache=False and improves weight conversion mappings in a small patch.
Small patch dedicated to optimizing gemma4, fixing inference with use_cache=False due to k/v states sharing between layers, as well as conversion mappings for some models that would inconsistently serialize their weight names. It contains the following PRs:

Add MoE to Gemma4 TP plan (#45219) by @sywangyi and @Cyrilvallez

[gemma4] Dissociate kv states sharing from the Cache (#45312) by @Cyrilvallez

[gemma4] Remove all shared weights, and silently skip them during loading (#45336) by @Cyrilvallez

Fix conversion mappings for vlms (#45340) by @Cyrilvallez

Original source
Apr 9, 2026
- Date parsed from source:
  Apr 9, 2026
- First seen by Releasebot:
  Apr 9, 2026
transformers by Hugging Face

Patch release v5.5.1

transformers ships patch v5.5.1 with Gemma4 and vLLM fixes plus export and integration test improvements.
Patch release v5.5.1

This patch is very small and focuses on vLLM and Gemma4!

Fix export for gemma4 and add Integration tests (#45285) by @Cyrilvallez

Fix vllm cis (#45139) by @ArthurZucker

Original source
Apr 7, 2026
- Date parsed from source:
  Apr 7, 2026
- First seen by Releasebot:
  Apr 7, 2026
- Modified by Releasebot:
  Apr 11, 2026
Hugging Face

Agent Traces on the Hub

Hugging Face now supports uploading agent traces to Datasets with auto-detection and a dedicated viewer for sessions, turns and tool calls.
You can now upload traces from your agents (Claude Code, Codex, Pi) directly to Hugging Face Datasets. The Hub auto-detects trace formats and tags your dataset as Traces, with a dedicated viewer for browsing sessions, turns, tool calls, and model responses.

No preprocessing needed, just upload the JSONL files from your local session directories as-is:

Agent

Local session directory

Claude Code ~/.claude/projects

Codex ~/.codex/sessions

Pi ~/.pi/agent/sessions

Useful for sharing debugging workflows, benchmarking agent behavior across models, or building training data from real coding sessions.
Original source
Apr 2, 2026
- Date parsed from source:
  Apr 2, 2026
- First seen by Releasebot:
  Apr 2, 2026
transformers by Hugging Face

Release v5.5.0

transformers releases Gemma 4 multimodal models, NomicBERT text embeddings, and Music Flamingo for audio-language reasoning, while also adding major cache updates, vision fixes, and broader bug fixes and improvements.
Release v5.5.0

New Model additions
Gemma4
Gemma 4 is a multimodal model with pretrained and instruction-tuned variants, available in 1B, 13B, and 27B parameters. The architecture is mostly the same as the previous Gemma versions. The key differences are a vision processor that can output images of fixed token budget and a spatial 2D RoPE to encode vision-specific information across height and width axis.

You can find all the original Gemma 4 checkpoints under the Gemma 4 release.

The key difference from previous Gemma releases is the new design to process images of different sizes using a fixed-budget number of tokens. Unlike many models that squash every image into a fixed square (like 224×224), Gemma 4 keeps the image's natural aspect ratio while making it the right size. There a a couple constraints to follow:

The total number of pixels must fit within a patch budget

Both height and width must be divisible by 48 (= patch size 16 × pooling kernel 3)

Important

Gemma 4 does not apply the standard ImageNet mean/std normalization that many other vision models use. The model's own patch embedding layer handles the final scaling internally (shifting values to the [-1, 1] range).

The number of "soft tokens" (aka vision tokens) an image processor can produce is configurable. The supported options are outlined below and the default is 280 soft tokens per image.
Soft Tokens Patches (before pooling) Approx. Image Area 70 630 ~161K pixels 140 1,260 ~323K pixels 280 2,520 ~645K pixels 560 5,040 ~1.3M pixels 1,120 10,080 ~2.6M pixels
To encode positional information for each patch in the image, Gemma 4 uses a learned 2D position embedding table. The position table stores up to 10,240 positions per axis, which allows the model to handle very large images. Each position is a learned vector of the same dimensions as the patch embedding. The 2D RoPE which Gemma 4 uses independently rotate half the attention head dimensions for the x-axis and the other half for the y-axis. This allows the model to understand spatial relationships like "above," "below," "left of," and "right of."
NomicBERT
NomicBERT is a BERT-inspired encoder model that applies Rotary Position Embeddings (RoPE) to create reproducible long context text embeddings. It is the first fully reproducible, open-source text embedding model with 8192 context length that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on short-context MTEB and long context LoCo benchmarks. The model generates dense vector embeddings for various tasks including search, clustering, and classification using specific instruction prefixes.

Links: Documentation | Paper

Internalise the NomicBERT model (#43067) by @ed22699 in #43067
MusicFlamingo
Music Flamingo is a fully open large audio–language model designed for robust understanding and reasoning over music. It builds upon the Audio Flamingo 3 architecture by including Rotary Time Embeddings (RoTE), which injects temporal position information to enable the model to handle audio sequences up to 20 minutes. The model features a unified audio encoder across speech, sound, and music with special sound boundary tokens for improved audio sequence modeling.

Links: Documentation | Paper

Add Music Flamingo (#43538) by @lashahub in #43538

Breaking changes

Mamba and hybrid model caches are now first-class native citizens in the library, so users working with Mamba-based or hybrid (Mamba + attention) models should update their code to use the new native cache classes instead of any previous workarounds.

🚨 [Cache] Native mamba & hybrid cache (#44950) by @Cyrilvallez

Remote code execution support has been removed from the native LightGlue integration, so users who were loading LightGlue with trust_remote_code=True must remove that argument and use the model directly through the standard native API.

🚨 [LightGlue] Remove remote code execution (#45122) by @vasqu

Vision

Several vision-related bugs were fixed in this release, including correcting the Gemma vision mask to support video inputs, resolving a dependency issue that incorrectly required torchvision for PIL-based image processors, and patching bugs in the Janus image generation model and image loading. Local code resolution for tokenizers and image processors was also corrected.

Generalize gemma vision mask to videos (#45185) by @zucchini-nlp in [#45185]

Fix explicit local code resolution for tokenizers and image processors (#45169) by @hmellor in [#45169]

fix bug for janus model image generation (#45044) by @kaixuanliu in [#45044]

[Bugfix] Remove incorrect torchvision requirement from PIL backend image processors (#45045) by @Lidang-Jiang in [#45045]

Avoid Image.open failure (#44645) by @sywangyi in [#44645]

Cache

Improved the performance of repository checks (check-repo) by introducing file-level and AST-level disk caching, achieving up to a 27x speedup (from ~46s to ~1.6s with a warm cache), and fixed the mlinter cache location in .gitignore.

refactoring: speedup static checks with disk cache (#44992) by @tarekziade in [#44992]

refactor: added cache in check_repo (#45012) by @tarekziade in [#45012]

chore: Fix mlinter cache location (#45052) by @tarekziade in [#45052]

Bugfixes and improvements

Fix resized LM head weights being overwritten by post_init (#45079) by @javierdejesusda in [#45079]

[Qwen3.5 MoE] Add _tp_plan to ForConditionalGeneration (#45124) by @danielquintas8 in [#45124]

fix(models): Fix dtype mismatch in SwitchTransformers and TimmWrapperModel (#45074) by @harshaljanjani in [#45074]

[misc] fix qwen35 tests: correct the text model type and skip reverse_mapping (#45173) by @JJJYmmm in [#45173]

🔒 Pin GitHub Actions to commit SHAs (#45180) by @paulinebm in [#45180]

Use doc-builder runnable example for GLM-ASR (#44277) by @tarekziade in [#44277]

CI] Small T5 expectations updated (#45138) by @Abdennacer-Badaoui in [#45138]

fix: correct type annotations across config classes for @strict validation (#45007) by @Krishnachaitanyakc in [#45007]

Fix T5Attention shape mismatch under Tensor Parallelism (#45109) by @aws-zhanxun in [#45109]

[refactor] Serving into proper modules (#44796) by @SunMarc in [#44796]

Re-add regex substitutions to the response parsing spec (#45166) by @Rocketknight1 in [#45166]

Fix incorrect TrainingArguments example in training.md (#45150) by @maanas1234 in [#45150]

Add parse_response to Processor, make it a bit more official (#45143) by @Rocketknight1 in [#45143]

DeepGEMM (#44832) by @IlyasMoutawwakil in [#44832]

fix: prefer registered config over remote code in AutoConfig.from_pretrained (#45094) by @HanFa in [#45094]

[serving] Fix continuous batching JSON response serialization (#45057) by @NathanHB in [#45057]

Fix stupid test fetcher (#45140) by @ydshieh in [#45140]

[CB] Add warmup feature (#45112) by @remi-or in [#45112]

feature: added import complexity checker (#45013) by @tarekziade in [#45013]

Fix tests for janus model (#44739) by @kaixuanliu in [#44739]

CB improvements for serving (#45063) by @SunMarc in [#45063]

[docs] continuous batching (#44896) by @stevhliu in [#44896]

Fix few issues in Qwen_3_Omni_Moe (#44848) by @Sai-Suraj-27 in [#44848]

Fix TypeError in rope validation when ignore_keys is a list (#45069) by @Fr0do in [#45069]

Remove unused TensorFlow env var (#45065) by @Sai-Suraj-27 in [#45065]

fix: add identity reverse_op to dequantize ops for save_pretrained (#44983) by @Hyungkeun-Park-Nota in [#44983]

Fix when RoPE params are in kwargs (#45049) by @zucchini-nlp in [#45049]

chore: update update_metdata.yml (#45054) by @hf-security-analysis[bot] in [#45054]

[FA] Fix BC support for a few versions + add deprecation cycle (#45061) by @vasqu in [#45061]

fix(testing): Fix Parakeet, Evolla, Pi0, and Phi-3 test failures on main CI (#45004) by @harshaljanjani in [#45004]

Allow advanced users to override model_type in AutoConfig.from_pretrained (#45058) by @hmellor in [#45058]

Fix failing SmolLM3IntegrationTest (#45048) by @Sai-Suraj-27 in [#45048]

chore: remove old extras (#45024) by @tarekziade in [#45024]

Embedding VLMs don't need a head (#45000) by @zucchini-nlp in [#45000]

Fix GraniteConfig type hints to accept int for multiplier fields (#45019) by @javierdejesusda in [#45019]

fix: preserve rotary_pct across save/load cycle in GPTNeoX configs (#44985) by @Krishnachaitanyakc in [#44985]

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@ed22699
Internalise the NomicBERT model (#43067)

@tarekziade
Use doc-builder runnable example for GLM-ASR (#44277)
refactoring: speedup static checks with disk cache (#44992)
feature: added import complexity checker (#45013)
refactor: added cache in check_repo (#45012)
chore: remove old extras (#45024)
chore: Fix mlinter cache location (#45052)
refactor: speed up docstring checker (#45009)

@Krishnachaitanyakc
fix: correct type annotations across config classes for @strict validation (#45007)
fix: preserve rotary_pct across save/load cycle in GPTNeoX configs (#44985)

@lashahub
Add Music Flamingo (#43538)

@Lidang-Jiang
[Bugfix] Remove incorrect torchvision requirement from PIL backend image processors (#45045)
Original source
Mar 31, 2026
- Date parsed from source:
  Mar 31, 2026
- First seen by Releasebot:
  Apr 1, 2026
Hugging Face

Mar 31, 26

Hugging Face adds persistent HF Buckets as Space storage for model weights, uploads, and shared files.

You can now mount HF Buckets as persistent storage volumes directly in your Spaces. In Space settings, the new "Storage Buckets" section lets you create or select a bucket, set the mount path and access mode. You can also attach a bucket when creating a new Space.

Useful for caching model weights, storing user uploads, or sharing files between Spaces under the same organization.
Original source

Hugging Face Release Notes

Hugging Face Products

All Hugging Face Release Notes (44)

Diffusers 0.38.0: New image and audio pipelines, Core library improvements, and more

New Pipelines

LLaDA2

Nucleus-MoE

Ernie-Image

LongCat-AudioDiT

Ace-Step 1.5

Flux.2 Small Decoder

Modular Pipeline Support

Core Library

All commits

Significant community contributions

Release v5.7.0

Release v5.7.0

New Model additions

Laguna

DEIMv2

Attention

Tokenizers

Generation

Kernels

Bugfixes and improvements

Significant community contributions

Patch release v5.6.2

Patch release v5.6.2

Patch release v5.6.1

Patch release v5.6.1

Release v5.6.0

Release v5.6.0

New Model additions

Breaking changes

Serve

Vision

Parallelization

Tokenization

Cache

Audio

Bugfixes and improvements

Significant community contributions

Apr 17, 26

Introducing Kernels

Patch release v5.5.4

Patch release v5.5.4

ZeroGPU overquota

Patch release: v5.5.3

Patch release: v5.5.2

Patch release v5.5.1

Patch release v5.5.1

Agent Traces on the Hub

Agent

Release v5.5.0

Release v5.5.0

New Model additions

Breaking changes

Vision

Cache

Bugfixes and improvements

Significant community contributions

Mar 31, 26

Related vendors