Hugging Face Release Notes

Follow

60 release notes curated from 3 sources by the Releasebot Team. Last updated: Jun 16, 2026

Get this feed:

Hugging Face Products

  • Jun 15, 2026
    • Date parsed from source:
      Jun 15, 2026
    • First seen by Releasebot:
      Jun 16, 2026
    Hugging Face logo

    transformers by Hugging Face

    Patch release v5.12.1

    transformers ships a patch release with a PEFT lower-bound update and a fix for auto tokenizer Mistral resolution.

    Patch release v5.12.1

    Updated the lower bound for PEFT and a fix for auto tokenizer to properly resolve the mistral tokenizer (when mistral-common is installed). This is similar to v.5.10.3 minus the fixes that were already included in the main release - vLLM will first target 5.10.3 🤗

    • Fix peft lower bound #46605 by @hmellor (#46605)
    • mistral common backend fix #46667 by @itazap (#46667)

    Full Changelog: v5.12.0...v5.12.1

    Original source
  • Jun 15, 2026
    • Date parsed from source:
      Jun 15, 2026
    • First seen by Releasebot:
      Jun 16, 2026
    Hugging Face logo

    transformers by Hugging Face

    Patch release v5.10.3

    transformers ships a patch release with vLLM sync fixes and updates for processor, model, and offset handling.

    Patch release v5.10.3

    A few fixes needed for vLLM to sync with transformers 🤗

    • [fix] regression introduced by #45534 #46456 by @eustlb (#46456)
    • Fix {image/video/audio}_token_ids in ProcessorMixin #46500 by @hmellor (#46500)
    • Fix InternVL models #46524 by @hmellor (#46524)
    • Fix the offsets in processing #46525 by @zucchini-nlp (#46525)
    • Fix peft lower bound #46605 by @hmellor (#46605)
    • mistral common backend fix #46667 by @itazap (#46667)

    Full Changelog: v5.10.2...v5.10.3

    Original source
  • All of your release notes in one feed

    Join Releasebot and get updates from Hugging Face and hundreds of other software products.

    Create account
  • Jun 12, 2026
    • Date parsed from source:
      Jun 12, 2026
    • First seen by Releasebot:
      Jun 16, 2026
    Hugging Face logo

    transformers by Hugging Face

    Release v5.12.0

    transformers releases v5.12.0 with new model additions, including MiniMax-M3-VL for vision-language tasks, PP-OCRv6 OCR weights, and Parakeet-RNNT speech recognition, plus a broad round of bug fixes, CI improvements, and documentation updates.

    Release v5.12.0

    New Model additions

    MiniMax-M3-VL

    MiniMax-M3-VL is the vision-language member of the MiniMax-M3 family that pairs a CLIP-style vision tower with 3D rotary position embeddings with the MiniMax-M3 text backbone. It uses a mixed dense/sparse Mixture-of-Experts decoder with SwiGLU-OAI gated experts and a lightning indexer for block-sparse attention. The model processes images through a Conv3d patch embedding system and includes specialized components for efficient multimodal understanding and generation.

    Links: Documentation

    Add minimax m3vl (#46600) by @ArthurZucker in #46600

    PP-OCRv6: update documentation and slow tests (#46576)

    The official weights for PP-OCRv6 are out: PP-OCRv6 is a lightweight OCR system that combines architectural innovation with data-centric optimization. It redesigns the backbone, detection neck, and recognition neck around a unified MetaFormer-style building block with structural reparameterization. Three model tiers (medium, small, tiny) share the same block primitives, covering deployment scenarios from server to edge.

    PP-OCRv6: update documentation and slow tests (#46576) by @ zhang-prog

    Add Parakeet-RNNT (#46331)

    ParakeetForRNNT: a Fast Conformer Encoder + an RNN-T (RNN Transducer) decoder

    RNN-T Decoder: Standard neural transducer:

    LSTM prediction network maintains language context across token predictions.

    Joint network combines encoder and decoder outputs.

    Greedy transducer decoding for inference: a blank emission advances the encoder frame by one, a non-blank emission stays on the same frame.

    Add Parakeet-RNNT (#46331) by @eustlb

    Bugfixes and improvements

    [CI] don't export OTELs within the tests (#46602) by @tarekziade in [#46602]

    [CI] capture checkers output in OTEL (#46601) by @tarekziade in [#46601]

    Lfm2: thread seq_idx through ShortConv for packed/varlen inputs (#46588) by @ChangyiYang in [#46588]

    put output_hidden_states into filter_output_hidden_states (#46422) by @molbap in [#46422]

    a11 for checkers (#46599) by @tarekziade in [#46599]

    Fix stop string matching for byte-fragment tokens (#46530) by @Incheonkirin in [#46530]

    [DiffusionGemma] better docs and links (#46569) by @gante in [#46569]

    Require trust_remote_code to run a local-directory custom_generate (#46483) by @LinZiyuu in [#46483]

    Fix torchaudio version not tied to torch version in docker file (#46594) by @ydshieh in [#46594]

    [CI] Enable PR CI for all fork PRs via security gate (#46591) by @ydshieh in [#46591]

    [CB] [Minor] Add parameter to tune default compile level (#46533) by @remi-or in [#46533]

    Make DiffusionGemma trainable (#46568) by @kashif in [#46568]

    docs: 🌐 add Turkish translation for README file (#46312) by @onuralpszr in [#46312]

    fix-trainer-tests (#46541) by @SunMarc in [#46541]

    Remove unnecessary expand_as in get_placeholder_mask across VLMs (#44907) by @syncdoth in [#44907]

    [CI] Catch all shell/process execution issues in security gate via Bandit JSON report (#46560) by @ydshieh in [#46560]

    Honor a concrete dtype in AutoModel for composite checkpoints (#46514) by @qflen in [#46514]

    [CI] Implement real security check in PR CI security gate (#46557) by @ydshieh in [#46557]

    [CI] Add 60s delay in security gate for flow observation (#46555) by @ydshieh in [#46555]

    [TBC] [CI] Auto-approve PR CI for fork PRs via security gate (#46553) by @ydshieh in [#46553]

    [CI] fix and make less flaky (#46543) by @zucchini-nlp in [#46543]

    Fix hf_hub_download not placing file in current dir for url_to_local_path (#46545) by @ydshieh in [#46545]

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    @ArthurZucker

    Add minimax m3vl (#46600)

    @eustlb

    Add Parakeet-RNNT (#46331)

    Original source
  • Jun 12, 2026
    • Date parsed from source:
      Jun 12, 2026
    • First seen by Releasebot:
      Jun 12, 2026
    Hugging Face logo

    Hugging Face

    Jun 12, 26

    Hugging Face adds service accounts for enterprise organizations, giving teams dedicated organization-owned identities for automation, CI/CD, and backend integrations with fine-grained tokens, resource group access, and no impact on paid seats.

    Enterprise organizations can now create service accounts: dedicated, organization-owned identities for programmatic access such as CI/CD pipelines, automation scripts, and backend integrations. They aren't tied to any individual, so automated workflows keep running even as team members join or leave the organization.

    Admins manage them from the organization settings, where access is granted exclusively through fine-grained tokens scoped to what each workflow needs. Tokens can be generated, updated, rotated, or revoked at any time. Service accounts can also be added to resource groups, making it easy to align automated workflows with your organization's existing access structure. Because service accounts don't consume a paid seat, they can be scoped and isolated without impacting member quotas or billing.

    Read the documentation to learn more.

    Original source
  • Jun 8, 2026
    • Date parsed from source:
      Jun 8, 2026
    • First seen by Releasebot:
      Jun 11, 2026
    Hugging Face logo

    Hugging Face

    Jun 8, 26

    Hugging Face adds secretless trusted publishing from GitHub, GitLab, and CI with gated repo access.

    You can now publish to Hugging Face repositories from GitHub, GitLab, or custom CI systems without adding secrets, using workflow identity federation.

    This mirrors the secretless, Trusted Publishing experience available for npm and PyPI. CI jobs can also read from gated repositories that the user has access to. See the

    documentation

    Workflow Identity Federation for Trusted Publishing

    Original source
  • Jun 10, 2026
    • Date parsed from source:
      Jun 10, 2026
    • First seen by Releasebot:
      Jun 10, 2026
    Hugging Face logo

    transformers by Hugging Face

    Release v5.11.0

    transformers releases v5.11.0 with new model additions including DiffusionGemma and DeepSeek-V3.2, plus kernel fusion and parallelization improvements. The update also brings a broad set of bug fixes, testing updates, and documentation refreshes.

    Release v5.11.0

    New Model additions

    DiffusionGemma

    DiffusionGemma is engineered to reduce the sequential bottlenecks of standard causal language models by employing an encoder-decoder architecture specifically optimized for inference speed. During inference, DiffusionGemma leverages multi-canvas sampling, where rather than generating one token at a time, the model iteratively denoises a full block of tokens using a diffusion sampler. This block-autoregressive approach facilitates text generation at higher speeds compared to traditional sequential generation methods.

    Links: Documentation

    GPU go brr (#46540) by @gante in #46540

    DeepSeek-V3.2

    DeepSeek-V3.2-Exp is an experimental model from DeepSeek-AI that introduces DeepSeek Sparse Attention (DSA), a trainable, fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios. Built on top of DeepSeek-V3.1-Terminus with a 685B-parameter Mixture-of-Experts backbone, it reduces the quadratic cost of attention over long sequences by attending only to a selected subset of past tokens while maintaining virtually identical benchmark performance. The work was extended in DeepSeek-V3.2 which pairs DSA with scalable reinforcement learning and achieves gold-medal level results on competition math and competitive programming benchmarks.

    Links: Documentation | Paper

    Add deepseek 3.2 exp (#41251) by @ArthurZucker in #41251

    Kernels

    The KernelConfig API was extended to support n-to-1 module fusion and parameter transformation, simplifying how custom kernels are integrated with Transformers modules. Additional fixes include resolving a dtype mismatch in the Mamba2 CUDA kernel path for NemotronH/Zamba2, adding fine-grained fp8/fp4 Triton kernel support, and correcting the FalconMamba fast-path warning to recommend pip install kernels instead of mamba-ssm.

    Extended & simplified n-to-1 kernel fusion via KernelConfig (#46339) by @michaelbenayoun in [#46339]

    Triton finegrained fp8/fp4 (#46407) by @IlyasMoutawwakil in [#46407]

    Fix dtype mismatch in NemotronH/Zamba2 Mamba2 CUDA-kernel path (out_proj) (#46487) by @yuekaizhang in [#46487]

    fix(falcon_mamba): recommend pip install kernels in fast-path warning (#46343) by @Anai-Guo in [#46343]

    Parallelization

    Fixed model parallel beam search bugs in the Qwen2-VL, Qwen2.5-VL, and Qwen3-VL MoE model families, and added documentation for tensor parallelism support with continuous batching.

    [docs] tp for continuous batching (#46019) by @stevhliu in [#46019]

    revisit history parallel beam search tests to avoid unnecessary fix (#46495) by @kaixuanliu in [#46495]

    fix qwen series VL model's model parallel bug (#46316) by @kaixuanliu in [#46316]

    Bugfixes and improvements

    Fix the offsets in processing (#46525) by @zucchini-nlp in [#46525]

    Fix buggy action sha pin (#46534) by @ydshieh in [#46534]

    Fix trailing comma bug in DataCollatorForLanguageModeling example (#46527) by @JemmaUZH in [#46527]

    Fix missing Gemma4Processor._compute_audio_num_tokens (#46416) by @csantosbh in [#46416]

    Fix InternVL models (#46524) by @hmellor in [#46524]

    fix(afmoe): reduce tokens in test_compile_static_cache to avoid flaky bfloat16 drift (#46521) by @ydshieh in [#46521]

    [CB] Add a "max_requests_per_batch" parameter (#46434) by @remi-or in [#46434]

    revamp cv docs and fix rf-detr (#46219) by @merveenoyan in [#46219]

    Update hub metadata (#46379) by @zucchini-nlp in [#46379]

    extend DeepseekV4FlashIntegrationTest to non-cuda device (#46517) by @sywangyi in [#46517]

    [docs] deepgemm (#46361) by @stevhliu in [#46361]

    [fix] regression introduced by #45534 (#46456) by @eustlb in [#46456]

    Use torchvision's native LANCZOS interpolation instead of PIL fallback (#46496) by @NicolasHug in [#46496]

    Add debugging info in pr-ci-caller.yml (#46505) by @ydshieh in [#46505]

    Fix tests: 'Cohere2MoeModel' object has no attribute 'hf_device_map' (#46337) by @kaixuanliu in [#46337]

    Bump the actions group across 1 directory with 19 updates (#46414) by @dependabot[bot] in [#46414]

    Log some information in .github/workflows/pr-ci-post-dashboard-link.yml (#46499) by @ydshieh in [#46499]

    feat(quantizers): support non-weight param names in TorchAo safetensors loading (#46325) by @agesf in [#46325]

    docs: fix typo in make_list_of_images docstring (#46469) by @ramkumar27072006 in [#46469]

    add XPU expectation for deepseek_ocr2 model tests (#46492) by @kaixuanliu in [#46492]

    Fix sapiens2 tests: add XPU device expectations (#46488) by @kaixuanliu in [#46488]

    Add vLLM smoke test to CI (#46383) by @hmellor in [#46383]

    extend deepseek v4 test to xpu (#46366) by @sywangyi in [#46366]

    Added cosmos3 model (#46146) by @MaciejBalaNV in [#46146]

    fbgemm_fp8:Keep the current device aligned with the input tensor (#46403) by @kaixuanliu in [#46403]

    [Modular] Add no_inherit_decorators and fixup wrong RoPE related inheritances (#46440) by @Bissmella in [#46440]

    skip deepgemm test except cuda (#46090) by @jiqing-feng in [#46090]

    Fix/video classification pipeline video processor (#46256) by @J3r3myPerera in [#46256]

    ci: less flaky test_assisted_decoding_matches_greedy_search_1_same (#46445) by @ydshieh in [#46445]

    Fix flip_back graph break (#46344) by @guarin in [#46344]

    Add the other processors to auto-mappings (#46046) by @zucchini-nlp in [#46046]

    fix: compatibility with torch<=2.7 (#46393) by @andylin-hao in [#46393]

    fix: remove dynamic per-actor Slack ID lookup in ssh-runner workflow (#46327) by @ydshieh in [#46327]

    [docs] Romanian translation of pipeline_tutorial.md, pipeline_gradio.md, pipeline_webserver.md and add_new_pipeline.md. (#46388) by @filipinescu in [#46388]

    [docs] gemma4 typos (#46351) by @stevhliu in [#46351]

    [docs] padding-free training (#46333) by @stevhliu in [#46333]

    fix[vLLM x v5]: Default untied embeddings in AudioFlamingo3 and VibeVoice (#46400) by @harshaljanjani in [#46400]

    Fix deepspeed docker (#46108) by @SunMarc in [#46108]

    Fix conversion for clip models (#46406) by @zucchini-nlp in [#46406]

    ci: mention code quality failure in CI dashboard comment (#46415) by @ydshieh in [#46415]

    Fix noisy logging from image_processing module aliases issue - 46298 (#46350) by @skshmjn in [#46350]

    Raise tqdm minimum to 4.60 to match tqdm.contrib.logging import (#46397) by @n0gu-furiosa in [#46397]

    fix(gemma4_unified): conversion script and config bugs (#46398) by @douglas-reid in [#46398]

    [docs] remove sparsity from compressed-tensors (#46387) by @stevhliu in [#46387]

    [CB] Fix crashes when fork is not possible (#46251) by @remi-or in [#46251]

    Improve CI dashboard comment: rename and deduplicate (#46412) by @ydshieh in [#46412]

    Fix missing f-string prefixes in error messages (#46354) by @joaopedroassad in [#46354]

    Add workflow to post CI Grafana dashboard link to PR (#46410) by @ydshieh in [#46410]

    [docs] Romanian translation of fast_tokenizers.md, custom_tokenizers.md, tokenizer_summary.md, image_processors.md and video_processors.md. (#46356) by @filipinescu in [#46356]

    Clean up new models after release (#46092) by @zucchini-nlp in [#46092]

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    @ArthurZucker

    Add deepseek 3.2 exp (#41251)

    @gante

    GPU go brr (#46540)

    @merveenoyan

    revamp cv docs and fix rf-detr (#46219)

    @sgerrard

    Quantization for small models (#46449)

    @MaciejBalaNV

    Added cosmos3 model (#46146)

    @J3r3myPerera

    Fix/video classification pipeline video processor (#46256)

    @filipinescu

    [docs] Romanian translation of pipeline_tutorial.md, pipeline_gradio.md, pipeline_webserver.md and add_new_pipeline.md. (#46388)

    [docs] Romanian translation of fast_tokenizers.md, custom_tokenizers.md, tokenizer_summary.md, image_processors.md and video_processors.md. (#46356)

    Original source
  • Jun 8, 2026
    • Date parsed from source:
      Jun 8, 2026
    • First seen by Releasebot:
      Jun 10, 2026
    Hugging Face logo

    Hugging Face

    Jun 8, 26

    Hugging Face adds secret-less trusted publishing from GitHub, GitLab, and custom CI with gated repo read access.

    You can now publish to Hugging Face repositories from GitHub, GitLab, or custom CI systems without adding secrets, using workflow identity federation.

    This mirrors the secret‑less, Trusted Publishing experience available for npm and PyPI. CI jobs can also read from gated repositories that the user has access to. See the

    documentation

    Workflow Identity Federation for Trusted Publishing

    Original source
  • Jun 4, 2026
    • Date parsed from source:
      Jun 4, 2026
    • First seen by Releasebot:
      Jun 5, 2026
    Hugging Face logo

    transformers by Hugging Face

    Patch release v5.10.2

    transformers fixes clip model conversion bugs in patch release v5.10.2.

    Patch release v5.10.2

    There was a big bug in the model conversion of models related to clip, this affected models like sam3 and others. Please make sure to update 🙏

    Fix conversion for clip models by @zucchini-nlp (#46406)

    Full Changelog: v5.10.1...v5.10.2

    Original source
  • Jun 3, 2026
    • Date parsed from source:
      Jun 3, 2026
    • First seen by Releasebot:
      Jun 3, 2026
    Hugging Face logo

    transformers by Hugging Face

    Release v5.10.1

    transformers ships v5.10.1 with new model support for Gemma4 Unified, Sapiens2, DeepSeek-OCR-2, and JetBrains Mellum, plus broad fixes for model parallelism, cache handling, quantization, and Gemma4 stability.

    Release v5.10.1

    v5.10.0 was yanked as we publish on a corrupted branch. Sorry everyone, this happens when we rush a release!!!

    New Model additions

    Gemma4 unified+ Gemma4 MTP

    Gemma 4 12B Unified is an encoder-free multimodal model with pretrained and instruction-tuned variants. Unlike standard Gemma 4, which uses dedicated encoder towers, Gemma 4 12B Unified projects raw inputs directly into the language model's embedding space through lightweight linear pipelines. This results in a simpler architecture while maintaining strong multimodal performance.

    Key differences from standard Gemma 4:

    No Vision Tower: Raw pixel patches are projected directly into LM space via a Dense + LayerNorm pipeline with factorized 2D positional embeddings, replacing the vision encoder.

    No Audio Tower: Raw 16 kHz waveform samples are chunked into fixed-length frames and projected through a simple RMSNorm → Linear pipeline, replacing the mel spectrogram + Conformer encoder.

    Shared Multimodal Pipeline: Both vision and audio use the same Gemma4UnifiedMultimodalEmbedder (RMSNorm → Linear) for the final projection to text hidden space.

    You can find the original Gemma 4 12B Unified checkpoints under the Gemma 4 release.

    who needs encoders? (#46385) by @douglas-reid @sgerrard @vasqu @molbap

    Sapiens2

    Sapiens2 is a family of high-resolution vision transformers pretrained on ~1 billion curated human images, designed for human-centric computer vision tasks including pose estimation, body-part segmentation, surface normal estimation, and pointmap estimation. The models scale from 0.4B to 5B parameters and train at native 1K resolution, with hierarchical 4K variants for extended spatial reasoning. Sapiens2 achieves substantial improvements over its predecessor with +4 mAP in pose estimation, +24.3 mIoU in body-part segmentation, and 45.6% error reduction in normal estimation.

    Links: Documentation | Paper

    Add Sapiens2 Model (#45919) by @guarin in #45919

    DeepSeek-OCR-2

    DeepSeek-OCR-2 is an OCR-specialized vision-language model built on a distinctive architecture that combines a SAM ViT-B vision encoder with a Qwen2 hybrid attention encoder, connected through an MLP projector to a DeepSeek-V2 Mixture-of-Experts (MoE) language model. The model features a hybrid attention mechanism that applies bidirectional attention over image tokens and causal attention over query tokens, enabling efficient and accurate document understanding. It supports both plain OCR tasks and grounding capabilities with coordinate-aware output for document conversion to markdown format.

    Links: Documentation

    Add Deepseek-OCR-2 model (#45075) by @thisisiron in #45075

    Mellum

    Mellum is a code-focused Mixture-of-Experts language model developed by JetBrains. It is derived from the Qwen3-MoE architecture with per-layer-type RoPE and interleaved sliding window attention. The model has 12B total parameters with 2.5B active parameters per token, using 64 routed experts with 8 activated per token across 28 layers.

    Links: Documentation

    feat: Add support for JetBrains' Mellum v2 code generation model (#46112) by @shadeMe in #46112

    Breaking changes

    The Gemma4 vision pooler now casts inputs to float32 before scaling to prevent float16 overflow (inf saturation) with large checkpoints, which may cause minor numerical differences in outputs for users running Gemma-4 vision models in float16.

    🚨 Fix float16 overflow in Gemma4 vision pooler (#46277) by @Bluear7878

    Audio Language Models (ALMs) now have a dedicated base model class without a language modeling head, aligning them with the design of Vision Language Models (VLMs); users relying on the previous model class structure should update their code to use the new base model class where appropriate.

    🚨 [ALM] Add base model without head (#45534) by @eustlb

    Parallelization

    This release includes numerous bug fixes for model parallelism across multiple models (Gemma4, AltCLIP, ChineseClip, Blip-2, Whisper, Ovis2, Moshi) and parallel execution strategies, including fixes for tensor parallelism (TP), expert parallelism (EP), beam search under model parallel settings, and loss over-counting under TP/EP configurations. The continuous batching manager was also reworked for clearer control flow and improved TP race condition handling, and FSDP initialization via from_pretrained was introduced.

    Fix dsv4 dequant + tp/ep (#46378) by @IlyasMoutawwakil in [#46378]

    [CB] [Major] Rework manager to have clearer control flow + handle TP (#46070) by @remi-or in [#46070]

    fix series of bugs for model parallel beam search (#46280) by @kaixuanliu in [#46280]

    Fix model parallel issue for altclip model and ChineseClip model (#45487) by @kaixuanliu in [#45487]

    Model parallel fix (#46230) by @kaixuanliu in [#46230]

    [Revert] FSDP+Dtensor refactor related changes (#46246) by @vasqu in [#46246]

    Fix model parallel bugs for Gemma4 (#45817) by @kaixuanliu in [#45817]

    init FSDP through from_pretrained (#46102) by @3outeille in [#46102]

    fix model parallel device mismatch issue in create_bidirectional_mask (#46221) by @kaixuanliu in [#46221]

    Trainer.compute_loss: fix loss over-counting under TP and EP-as-TP (#45994) by @AmineDiro in [#45994]

    Fix caching allocator warmup byte estimation for EP model loading (#46149) by @sywangyi in [#46149]

    Cache

    Fixed a regression in encoder-decoder cache initialization where the decoder config was incorrectly applied to the cross-attention cache, and resolved a RuntimeError caused by buffer size limits when warming up the cache on MPS devices. Additional test infrastructure improvements were made to support read-only cache environments used in CI.

    fix: cache warmup RuntimeError on mps (#46239) by @McPatate in [#46239]

    Make more tests work with read-only cache (#46299) by @ydshieh in [#46299]

    Update a test to avoid writing to the default xet cache (#46250) by @ydshieh in [#46250]

    Fix a regression in encoder-decoder generation cache initialization (#46111) by @kaixuanliu in [#46111]

    Quantization

    Added support for DeepGEMM BF16, mixed FP8/FP4, and MegaMoE quantization via a grouped linear refactor, while fixing two bugs: an FP8 MoE reverse substring issue affecting DSv4 initialization, and a BitsAndBytes 4-bit/8-bit quantization bug that silently dropped chunked tensors from one-to-many weight converters.

    DeepGEMM BF16 + mixed FP8/FP4 + MegaMoE + refactor (#45634) by @IlyasMoutawwakil in [#45634]

    Fix fp8 moe reverse substring (#46265) by @ArthurZucker in [#46265]

    Fix bnb 4bit/8bit quantization drop chunked tensors bug (#46210) by @kaixuanliu in [#46210]

    Bugfixes and improvements

    Fix wrong changes produced by style/repo. check bot (#46371) by @ydshieh in [#46371]

    Fix path traversal when saving Bark voice preset embeddings (#46237) by @LinZiyuu in [#46237]

    Pass library_name/version to Hub calls via a shared HfApi (#46318) by @Wauplin in [#46318]

    docs: update ACL Anthology URL in CITATION.cff (#46352) by @irfaan101 in [#46352]

    [docs] contributing (#45465) by @stevhliu in [#45465]

    [docs] Romanian translation of contributing.md, modular_transformers.md, multimodal_processing.md, add_vision_processing_components.md, add_audio_processing_components.md, modeling_rules.md, model_output_tracing.md, auto_docstring.md, testing.md, pr_checks.md and add_new_model.md . (#46345) by @filipinescu in [#46345]

    [docs] xpu continuous batching (#46334) by @stevhliu in [#46334]

    Fix incorrect attribute mapping relationships in GLM MoE DSA Config (#46338) by @Dovis01 in [#46338]

    Fix grammar typos in Whisper documentation (#46336) by @calliec-1223 in [#46336]

    [docs] update num_items_in_batch for causal LMs (#46335) by @stevhliu in [#46335]

    Update compressed tensors minimum version (#46342) by @SunMarc in [#46342]

    Fix _is_package_available reporting available without a version (#46125) by @blipbyte in [#46125]

    remove sec (#46346) by @ydshieh in [#46346]

    fix: include transitive relative imports when loading from local directory (#46022) by @trducng in [#46022]

    perf(feature_extraction_sequence): skip re-splitting already-batched numpy arrays in pad() (#46329) by @Anai-Guo in [#46329]

    [Zamba] Support attn_implementation dispatch (#46317) by @YangKai0616 in [#46317]

    Fix TestAppRoutes test failures caused by deprecated asyncio.get_event_loop() on Python 3.10+ (#46340) by @ydshieh in [#46340]

    [Qwen3VL] Fix video token placeholder: use self.video_token instead of hardcoded "<|placeholder|>" (#46296) by @kpal002 in [#46296]

    chore(linter): fixes for rule 16 (#46023) by @tarekziade in [#46023]

    [docs] Romanian translation of weightconverter.md, models.md, custom_models.md, monkey_patching.md, fusion_mapping.md, how_to_hack_models.md, model_sharing.md and serialization.md. (#46309) by @filipinescu in [#46309]

    Normalize CUDA OOM errors when comparing commit failures in check_bad_commit (#46322) by @ydshieh in [#46322]

    Fix unhandled exception noise from background safetensors conversion thread (#45752) by @dhruv7477 in [#45752]

    Add Expectations for pipeline token classification tests (#46151) by @kaixuanliu in [#46151]

    [docs] fix auto-add release dates (#46283) by @zucchini-nlp in [#46283]

    Separate pip command syntax for notebook and CLI tabs in Quickstart (#46243) by @pvelayudhan in [#46243]

    Romanian translation of README.md, index.md, installation.md, _config.py and quicktour.md. (#46166) by @filipinescu in [#46166]

    Fall back to flat kwarg when modality dict is passed without it (#46195) by @Ace3Z in [#46195]

    Fix load_adapter OOM caused by full-model warmup sizing (#46145) by @Yooniel in [#46145]

    Replace assert with raise ImportError for optuna/ray dependency checks (#46263) by @SebTardif in [#46263]

    chore(linter): respect TRF017 modeling rule (#46260) by @tarekziade in [#46260]

    Delete dead code in qwen-vl series (#45827) by @zucchini-nlp in [#45827]

    qa: fix ty caching and align CI with local run (#46278) by @tarekziade in [#46278]

    Guard DeviceMesh import in continuous batching (#46205) by @danyalahmed1995 in [#46205]

    Processor compatibility with vLLM (#46258) by @zucchini-nlp in [#46258]

    Fix PR CI workflow cancellation condition (#46276) by @ydshieh in [#46276]

    [fix] toctree (#46106) by @stevhliu in [#46106]

    add more generic support for distributed trainer tests (#46109) by @kaixuanliu in [#46109]

    add XPU Expectations for florence2 and lfm2_vl model test (#46275) by @kaixuanliu in [#46275]

    Fix StaticCache building an empty layer list when num_kv_shared_layers == 0 (#46235) by @tengomucho in [#46235]

    Fix inverted assertion in remove_handler (#46227) by @SebTardif in [#46227]

    [ShieldGemma2] Support attn_implementation dispatch (#46069) by @YangKai0616 in [#46069]

    [Gemma4] Replace one-hot matmul with F.embedding in position embeddings (#46176) by @Sriniketh24 in [#46176]

    fix: kosmos2.5: properly expand embeddings table (#45835) by @nunq in [#45835]

    find pytest launch error in torch 2.13.0.dev20260526 (#46252) by @sywangyi in [#46252]

    [Test][Kosmos2.5] Add XPU expectations for integration tests (#46135) by @YangKai0616 in [#46135]

    Support FA2 flash_attn_with_kvcache for XPU continuous batching (#46028) by @YangKai0616 in [#46028]

    [Configs] Fix layer type validation to include its mlp counterpart (#46220) by @vasqu in [#46220]

    Fix num_items_in_batch over-counting for causal LM losses (#46204) by @qgallouedec in [#46204]

    RF-DETR doc fixes (#46244) by @merveenoyan in [#46244]

    Use main instead of commit SHA for now (#46241) by @ydshieh in [#46241]

    Enable push event (to main) for PR CI workflow (#46240) by @ydshieh in [#46240]

    fix(hrm_text): Add XPU Expectations for tests (#46214) by @kaixuanliu in [#46214]

    [deepseek_v4] keep hc_head / sinks / position_bias in fp32 (#46198) by @ArthurZucker in [#46198]

    Fix FSDP2 and distributed checkpointing imports for older PyTorch versions (#46141) by @ryota-komatsu in [#46141]

    Fix Gemma4 Array Mask Indexing (#46203) by @petecao in [#46203]

    utils: handle flash_attn missing from importlib packages_distributions without crashing (#45524) by @SAY-5 in [#45524]

    [AMD CI] revert AMD mi325 hf-workflows ref from SHA back to @main (#46213) by @Abdennacer-Badaoui in [#46213]

    [GLM-4.6V] Update with GLM-GA Processor (#46184) by @zRzRzRzRzRzRzRzR in [#46184]

    update xpu expectation for falcon mamba (#46086) by @sywangyi in [#46086]

    chore: enable Dependabot weekly GitHub Actions bumps (#46157) by @hf-dependantbot-rollout[bot] in [#46157]

    Fix Gemma4 use_bidirectional_attention="all" mask behavior (#46079) by @oliverholworthy in [#46079]

    Fix loading with only 1 device or distributed config (#46197) by @Cyrilvallez in [#46197]

    Fix TypeError on list-typed ignore_keys_at_rope_validation in RoPE config (#46142) by @Charly21r in [#46142]

    Support XPU autocast dtype fallback for FlashAttention (#46199) by @YangKai0616 in [#46199]

    Fix path traversal when saving named chat templates (#46191) by @LinZiyuu in [#46191]

    Fix is_last off-by-one in MaskGenerationPipeline for partial batches (#46136) by @J3r3myPerera in [#46136]

    Fix wrong variable in check_model_type isinstance check (#46080) by @SebTardif in [#46080]

    Enable passing kwargs through RoFormer models (#46171) by @ir2718 in [#46171]

    Update cohere2_moe tp_plan (#46189) by @Cyrilvallez in [#46189]

    Update release tool (#46193) by @Cyrilvallez in [#46193]

    [loading] Fix base_model_prefix issues in conversions (#46067) by @Cyrilvallez in [#46067]

    Bump dev version (#46188) by @Cyrilvallez in [#46188]

    Update self-comment-ci (#46137) by @guarin in [#46137]

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    @filipinescu

    [docs] Romanian translation of contributing.md, modular_transformers.md, multimodal_processing.md, add_vision_processing_components.md, add_audio_processing_components.md, modeling_rules.md, model_output_tracing.md, auto_docstring.md, testing.md, pr_checks.md and add_new_model.md . (#46345)

    [docs] Romanian translation of weightconverter.md, models.md, custom_models.md, monkey_patching.md, fusion_mapping.md, how_to_hack_models.md, model_sharing.md and serialization.md. (#46309)

    Romanian translation of README.md, index.md, installation.md, _config.py and quicktour.md. (#46166)

    @remi-or

    [CB] [Major] Rework manager to have clearer control flow + handle TP (#46070)

    @thisisiron

    Add Deepseek-OCR-2 model (#45075)

    @kaixuanliu

    Add Expectations for pipeline token classification tests (#46151)

    fix series of bugs for model parallel beam search (#46280)

    add more generic support for distributed trainer tests (#46109)

    add XPU Expectations for florence2 and lfm2_vl model test (#46275)

    Fix model parallel issue for altclip model and ChineseClip model (#45487)

    Model parallel fix (#46230)

    fix(hrm_text): Add XPU Expectations for tests (#46214)

    Fix model parallel bugs for Gemma4 (#45817)

    Fix bnb 4bit/8bit quantization drop chunked tensors bug (#46210)

    fix model parallel device mismatch issue in create_bidirectional_mask (#46221)

    Fix a regression in encoder-decoder generation cache initialization (#46111)

    @shadeMe

    feat: Add support for JetBrains' Mellum v2 code generation model (#46112)

    @vasqu

    [Revert] FSDP+Dtensor refactor related changes (#46246)

    [Configs] Fix layer type validation to include its mlp counterpart (#46220)

    @zRzRzRzRzRzRzRzR

    [GLM-4.6V] Update with GLM-GA Processor (#46184)

    @eustlb

    🚨 [ALM] Add base model without head (#45534)

    Original source
  • Jun 3, 2026
    • Date parsed from source:
      Jun 3, 2026
    • First seen by Releasebot:
      Jun 3, 2026
    Hugging Face logo

    transformers by Hugging Face

    v5.10.0

    transformers ships v5.10.0 release.

  • May 20, 2026
    • Date parsed from source:
      May 20, 2026
    • First seen by Releasebot:
      Jun 3, 2026
    Hugging Face logo

    transformers by Hugging Face

    Release v5.9.0

    transformers releases v5.9.0 with new model support for Cohere2-MoE and HRM-Text, expanded audio capabilities, and a range of generation and bug fixes. The update also includes breaking changes for text embeddings, plus improvements to multimodal, docs, and CI stability.

    Release v5.9.0

    New Model additions

    Cohere2Moe

    Command A+ is a Mixture-of-Experts (MoE) language model from Cohere that features a hybrid attention pattern combining sliding window and full attention layers. The model incorporates both shared and routed experts and supports a very large context window for processing extensive text sequences.

    Links: Documentation

    Add new cohere2_moe model (#46115) by @Cyrilvallez in #46115

    Parakeet tdt (#44171)

    Parakeet tdt (#44171) by @lmaksym

    HRM-Text

    HRM-Text is an improved autoregressive language-modeling variant of the Hierarchical Reasoning Model (HRM) that uses a hierarchical recurrent forward pass with two transformer stacks - one for slow, abstract planning (H) and one for fast, detailed computation (L) - reused inside a nested recurrence. It features PrefixLM attention where instruction tokens attend bidirectionally while response tokens attend causally, per-head sigmoid output gates, and parameterless RMSNorm. The model is designed as a base language model without instruction tuning or chat templates.

    Links: Documentation | Paper

    Add hrm text (#46025) by @abcd1927 in #46025

    Breaking changes

    The text_embeds input for SAM3, EdgeTAM, and SAM3-Lite-Text models now expects full text embeddings instead of just pooler outputs, aligning with other models in the library — users must update their inputs accordingly.

    🚨Fix memory leaks caused by lru decorators in vision models (#45922) by @yonigozlan

    Audio

    Audio support was expanded with the addition of AudioFlamingoNext model checkpoints and improved compilability of audio/vision encoders via standalone pure functions. Additional improvements include better error messaging when loading audio from video files and new documentation for audio/video processors.

    user friendly error when loading audio from video (#45221) by @eustlb in [#45221]

    [docs] adding audio/video processors (#45795) by @stevhliu in [#45795]

    Support Audio Flamingo Next checkpoints (#44830) by @lashahub in [#44830]

    Extract dynamic vision/audio tensors into standalone pure functions (#45396) by @IlyasMoutawwakil in [#45396]

    Generation

    Fixed generation issues including inputs_embeds and per_layer_inputs handling for Gemma4, an AttributeError in RAG's generate() caused by missing config fields, and flaky VLM generation tests by blocking special image tokens during sampling.

    Fix Gemma4 generation from inputs_embeds and per_layer_inputs (#46049) by @Cyrilvallez in [#46049]

    Fix AttributeError in RAG generate() for missing config fields (#46035) by @Sriniketh24 in [#46035]

    Block image_start/end_token_id in generation test sampling (#45914) by @Rocketknight1 in [#45914]

    Bugfixes and improvements

    Remove mask visualization tool from masking_utils.py (#46066) by @Cyrilvallez in [#46066]

    fix: owned_by field in GET /v1/models returns list instead of string (#46006) by @nileshpatil6 in [#46006]

    [CB] Remove OpenTelemetry (#45984) by @remi-or in [#45984]

    docs(readme): use canonical huggingface.co domain in prose links (#46042) by @kiwigitops in [#46042]

    Fix remaining RAG doc examples that crash on current transformers (#46044) by @Sriniketh24 in [#46044]

    Init the actual tensor, not a copy (#46030) by @Rocketknight1 in [#46030]

    docs: sync legacy ACL anthology URLs and update metrics across i18n READMEs (#46027) by @irfaan101 in [#46027]

    [MultimodalLM] add language_model to the get/set_input_embeddings logic (#46029) by @eustlb in [#46029]

    [HRM Text] Add integration tests (#46033) by @vasqu in [#46033]

    hy_v3: add XPU expectations (#45858) by @kaixuanliu in [#45858]

    exaone4_5: add XPU expectations (#45890) by @kaixuanliu in [#45890]

    hyperclovax: add XPU Expectations for CI test (#45926) by @kaixuanliu in [#45926]

    chore(ci): remove dead env vars from circleci-failure-summary-comment.yml (#45972) by @XciD in [#45972]

    [CB] [Major] Add tensor paralellism (#45821) by @remi-or in [#45821]

    docs: update models architecture count and sync ACL anthology URLs (#46001) by @irfaan101 in [#46001]

    bugfix(ci): avoid E2BIG in pr_slow_ci_suggestion (#45983) by @tarekziade in [#45983]

    RFDetr - use correct Roboflow org for release (#45946) by @sbucaille in [#45946]

    docs: Fix formatting issues in weightconverter.md (#45988) by @ArjunSrivastava1 in [#45988]

    Fix colqwen2 test (#45981) by @IlyasMoutawwakil in [#45981]

    Fix M-RoPE device mismatch in Qwen3VL family under FSDP2 CPU offload (#45861) by @jamesbraza in [#45861]

    [docs] chat template prefill (#45947) by @stevhliu in [#45947]

    [docs] decode fast path (#45899) by @stevhliu in [#45899]

    fix: restore _attn_implementation and fix request offset in generate_batch() (#45943) by @sergiopaniego in [#45943]

    Expose per_layer_inputs for every Gemma4 variants (#45927) by @Cyrilvallez in [#45927]

    chore: update benchmark_v2.yml (#45966) by @hf-security-analysis[bot] in [#45966]

    fix(ci): set persist-credentials: false on actions/checkout and close remaining template injection findings (#45964) by @XciD in [#45964]

    chore(ci): set default workflow permissions to contents: read (#45961) by @XciD in [#45961]

    fix(ci): remove template injection on pull_request_target workflows (#45956) by @XciD in [#45956]

    chore(ci): pin all GitHub Actions and reusable workflows by SHA (#45955) by @XciD in [#45955]

    [docs] ALMModelTest (#45900) by @stevhliu in [#45900]

    Enhance apply_chat_template to support custom field prefilling (reasoning_content, thinking, etc.) (#45896) by @Mamiglia in [#45896]

    BUGFIX: Support hubert models that don't have conv_pos_batch_norm configured (#45921) by @igordertigor in [#45921]

    Revert 45777 (#45942) by @Rocketknight1 in [#45942]

    pass the otel secrets (#45933) by @tarekziade in [#45933]

    Add initial torch_tpu backend support (#45918) by @tengomucho in [#45918]

    [CB] Hide activation footprint by using the CUDA graph pool (#45911) by @remi-or in [#45911]

    Require input_ids for repetition penalty (#45389) by @ruben-aghayan in [#45389]

    Fix undefined 'input' variable (#45895) by @fullyz in [#45895]

    Fix post processing RF-DETR (#46041) by @yonigozlan (direct commit on v5.9.0)

    [loading] Free up tensors faster inside ConversionOps (#46110) by @Cyrilvallez (direct commit on v5.9.0)

    Add new cohere2_moe model (#46115) by @Cyrilvallez (direct commit on v5.9.0)

    Fix cohere2 tp_plan for release by @Cyrilvallez (direct commit on v5.9.0)

    Release v5.9.0 by @Cyrilvallez (direct commit on v5.9.0)

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    @lmaksym

    Parakeet tdt (#44171)

    @eustlb

    user friendly error when loading audio from video (#45221)

    [MultimodalLM] add language_model to the get/set_input_embeddings logic (#46029)

    @remi-or

    [CB] Remove OpenTelemetry (#45984)

    [CB] [Major] Add tensor paralellism (#45821)

    [CB] Hide activation footprint by using the CUDA graph pool (#45911)

    @abcd1927

    Add hrm text (#46025)

    Original source
  • May 28, 2026
    • Date parsed from source:
      May 28, 2026
    • First seen by Releasebot:
      May 28, 2026
    Hugging Face logo

    Hugging Face

    May 28, 26

    Hugging Face adds a Base only toggle and Model Tree filter to show just base models or one relation type across the Hub.

    A new Base only toggle on the Models page hides every finetune, adapter, merge, and quantization, leaving just the original base models.

    Need the opposite? The Other 12 Model Tree filter does the reverse, listing only one relation type across the Hub, like every adapter or every quantized model.

    Original source
  • May 22, 2026
    • Date parsed from source:
      May 22, 2026
    • First seen by Releasebot:
      May 22, 2026
    Hugging Face logo

    Hugging Face

    May 22, 26

    Hugging Face adds Copy to Bucket for instant Hub-to-Bucket transfers of repositories and large files.

    You can now copy the contents of any repository directly from the Hub to a Bucket using the new “Copy to Bucket” button on repository pages. Powered by Xet server-side transfers, large files are copied instantly, making it possible to transfer terabytes in just a few seconds. Read more about copying files between repos and buckets.

    For example, you can now copy massive datasets or model checkpoints from the Hub into your Bucket, then mount the Bucket directly in your training jobs or Spaces to start working immediately. Read more about use cases.

    Original source
  • May 20, 2026
    • Date parsed from source:
      May 20, 2026
    • First seen by Releasebot:
      May 22, 2026
    Hugging Face logo

    Hugging Face

    May 20, 26

    Hugging Face adds model size filters to dataset benchmark leaderboards with top models marked by a medal badge.

    You can now filter dataset benchmark leaderboards by the number of model parameters. On any dataset leaderboard, pick a size range and the rankings refresh to that bucket. The top 3 models in each size category are marked with a 🏅.

    Original source
  • May 13, 2026
    • Date parsed from source:
      May 13, 2026
    • First seen by Releasebot:
      May 13, 2026
    Hugging Face logo

    transformers by Hugging Face

    Patch release v5.8.1

    transformers ships a patch release fixing Deepseek V4 integration and related serving and WeightConverter issues.

    Patch release v5.8.1

    This release is mainly to fix the Deepseek V4 integration!!!

    • [fix] Add fatal_error to ContinuousBatchingManager so the serving... by @qgallouedec, @remi-or
    • Fix WeightConverter regex incorrectly matching shared_experts as experts by @silencelamb, @claude
    • Fix deepseek v4 by @ArthurZucker (#45892)
    • Deepseek v4 csa mask collapse by @ArthurZucker, @Sawyer117 (#45928)
    Original source
Releasebot

Curated by the Releasebot team

Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.

Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.

Similar to Hugging Face with recent updates: