AI/ML Infrastructure Release Notes

Release notes for AI compute platforms, inference clouds and ML tooling

Products (12)

Latest AI/ML Infrastructure Updates

  • Mar 10, 2026
    • Date parsed from source:
      Mar 10, 2026
    • First seen by Releasebot:
      Mar 20, 2026
    Together AI logo

    Together AI

    Mar 10

    Together AI adds cached input token pricing for MiniMaxAI/MiniMax-M2.5 at $0.06 per 1M tokens.

    Cached Input Token Pricing

    Cached input token pricing is now available:

    • MiniMaxAI/MiniMax-M2.5: $0.06 per 1M cached input tokens (80% off standard input price)
    Original source Report a problem
  • Mar 7, 2026
    • Date parsed from source:
      Mar 7, 2026
    • First seen by Releasebot:
      Mar 20, 2026
    Together AI logo

    Together AI

    Mar 7

    Together AI adds serverless model bring-ups with Qwen/Qwen3.5-9B available.

    Serverless Model Bring Ups

    The following models have been added:

    Qwen/Qwen3.5-9B

    Original source Report a problem
  • All of your release notes in one feed

    Join Releasebot and get updates from Together AI and hundreds of other software products.

  • Mar 6, 2026
    • Date parsed from source:
      Mar 6, 2026
    • First seen by Releasebot:
      Mar 20, 2026
    BentoML logo

    BentoML

    v1.4.36

    BentoML releases v1.4.36 with key fixes for concurrency, readiness checks, tarfile safety, and API behavior, plus a reintroduced image API build_include feature and a typo correction. It also updates AWS BYOC docs and dependency tooling.

    What's Changed

    • fix: correct typo 'seperators' to 'separators' by @thecaptain789 in #5546
    • Reapply "feat: new image API: build_include" (#5531) by @frostming in #5539
    • fix: resolve AnyIO NoEventLoopError when calling sync API from async API by @paipeline in #5550
    • fix: Set SQLite busy_timeout and WAL mode to prevent 'database is locked' under concurrency by @VedantMadane in #5551
    • Fix memory leak in readiness checks with remote dependencies by @paipeline in #5553
    • fix: validate symlink targets in safe_extract_tarfile by @q1uf3ng in #5548
    • Revert "fix: resolve AnyIO NoEventLoopError when calling sync API from async API" by @frostming in #5554
    • chore: update AWS BYOC doc to v10 by @sauyon in #5559
    • chore(deps): bump actions/download-artifact from 7 to 8 by @dependabot[bot] in #5561
    • chore(deps): bump actions/upload-artifact from 6 to 7 by @dependabot[bot] in #5560
    • ci: pre-commit autoupdate [skip ci] by @pre-commit-ci[bot] in #5562

    New Contributors

    • @thecaptain789 made their first contribution in #5546
    • @paipeline made their first contribution in #5550
    • @VedantMadane made their first contribution in #5551
    • @q1uf3ng made their first contribution in #5548

    Full Changelog: v1.4.35...v1.4.36

    Original source Report a problem
  • Mar 6, 2026
    • Date parsed from source:
      Mar 6, 2026
    • First seen by Releasebot:
      Mar 20, 2026
    Together AI logo

    Together AI

    Mar 6

    Together AI deprecates several models and removes them from availability.

    Model Deprecations

    The following models have been deprecated and are no longer available:

    • mixedbread-ai/Mxbai-Rerank-Large-V2
    • moonshotai/Kimi-K2-Thinking
    • meta-llama/Llama-3.2-3B-Instruct-Turbo
    • moonshotai/Kimi-K2-Instruct-0905
    Original source Report a problem
  • Mar 4, 2026
    • Date parsed from source:
      Mar 4, 2026
    • First seen by Releasebot:
      Mar 20, 2026
    Hugging Face logo

    transformers by Hugging Face

    v5.3.0: EuroBERT, VibeVoice ASR, TimesFM2.5, PP-DocLayoutV2, OlmoHybrid, ModernVBert, Higgs Audio V2

    transformers adds new multilingual, audio, time-series, and document models, including EuroBERT, VibeVoice ASR, TimesFM 2.5, PP-DocLayoutV2, OLMo Hybrid, ModernVBERT, and Higgs Audio V2, alongside breaking changes, quantization updates, and broad fixes.

    New Model additions

    EuroBERT

    EuroBERT is a multilingual encoder model based on a refreshed transformer architecture, akin to Llama but with bidirectional attention. It supports a mixture of European and widely spoken languages, with sequences of up to 8192 tokens.

    Links: Documentation | Paper | Blog Post

    Add eurobert (#39455) by @ArthurZucker in #39455

    VibeVoice ASR

    VibeVoice ASR is an automatic speech recognition model from Microsoft that combines acoustic and semantic audio tokenizers with a causal language model for robust speech-to-text transcription. The model uses VibeVoice's acoustic and semantic tokenizers that process audio at 24kHz, paired with a Qwen2-based language decoder for generating transcriptions. It can process up to 60 minutes of continuous audio input, supports customized hotwords, performs joint ASR/diarization/timestamping, and handles over 50 languages with code-switching support.

    Links: Documentation | Paper

    Add VibeVoice ASR (#43625) by @ebezzam in #43625

    TimesFM2.5

    TimesFM 2.5 is a pretrained time-series foundation model that uses a decoder-only attention architecture with input patching for forecasting. The model is designed to provide accurate zero-shot forecasts across different domains, forecasting horizons and temporal granularities without requiring dataset-specific training. It builds on the original TimesFM architecture with enhancements including rotary attention, QK normalization, per-dimension attention scaling, and continuous quantile prediction.

    Links: Documentation | Paper

    Timesfm 2.5 (#41763) by @kashif in #41763

    PP-DocLayoutV2

    PP-DocLayoutV2 is a dedicated lightweight model for layout analysis, focusing specifically on element detection, classification, and reading order prediction. The model is composed of two sequentially connected networks: an RT-DETR-based detection model that performs layout element detection and classification, followed by a pointer network that orders these layout elements. It is designed to analyze document layouts by identifying and organizing various layout components in their proper reading sequence.

    Links: Documentation

    [Model] Add PP-DocLayoutV2 Model Support (#43018) by @zhang-prog in #43018

    OlmoHybrid

    OLMo Hybrid is a hybrid architecture model from Ai2 that combines standard transformer attention layers with linear attention layers using the Gated Deltanet. This hybrid approach aims to improve efficiency while maintaining model quality by interleaving full attention layers with linear attention layers. The model uses a custom cache system that handles both KV cache for attention layers and recurrent state for linear attention layers.

    Links: Documentation

    Add OLMo Hybrid model (#43358) by @yanhong-lbh in #43358

    ModernVBert

    ModernVBert is a Vision-Language encoder that combines ModernBert with a SigLIP vision encoder. It is optimized for visual document understanding and retrieval tasks, making it suitable for processing documents that contain both text and visual elements.

    Links: Documentation | Paper

    Add ModernVBERT models (#42504) by @paultltc in #42504

    ColModernVBert

    ColModernVBert is a model for efficient visual document retrieval that leverages ModernVBert to construct multi-vector embeddings directly from document images, following the ColPali approach. The model enables retrieval and scoring of visual documents by processing both text queries and document images to generate embeddings that can be compared for relevance scoring.

    Links: Documentation | Paper

    Add ModernVBERT models (#42504) by @paultltc in #42504

    Higgs Audio V2

    Higgs Audio V2 is a powerful audio foundation model developed by Boson AI that was pretrained on over 10 million hours of audio data and diverse text data. Despite having no post-training or fine-tuning, the model excels in expressive audio generation thanks to its deep language and acoustic understanding. The model supports various audio generation tasks including single-speaker and multi-speaker smart voice, zero-shot voice cloning, and multi-speaker voice cloning.

    Links: Documentation

    Add Higgs Audio V2 Model (#40294) by @szhengac in #40294

    Higgs Audio V2 Tokenizer

    The Higgs Audio V2 Tokenizer is an audio tokenization model that operates at a low frame rate of 25 fps while maintaining high audio quality, effectively halving the frame rate of many baseline models. It uses unified 24 kHz training that mixes speech, music, and sound-event clips in one model to capture both semantic and acoustic details, facilitating the training of audio language models. The model enables fast inference by avoiding diffusion steps, with an encoder/decoder architecture that processes batches quickly for real-time or large-scale tasks.

    Links: Documentation

    Add Higgs Audio V2 Model (#40294) by @szhengac in #40294

    Breaking changes

    Tensor parallelism (TP) support for dense and MoE decoder-only models has been fixed and stabilized, requiring users to update their TP configurations and conversion mappings accordingly.

    🚨 fix + tests dense & MoE TP all reduce (decoder only) (#43722) by @3outeille

    The Ernie4.5 VL MoE model class and configuration names have been renamed to align with vLLM/SGLang conventions, requiring users to update any references to the old model names in their code.

    🚨 [Ernie 4.5 VL Moe] Fix up namings to vllm/sglang convention (#44299) by @vasqu

    Several pipeline tasks have been removed or updated in the V5 cleanup (including question-answering, visual-question-answering, and image-to-image), requiring users to migrate to the replacement pipelines or updated task names.

    🚨 More V5 pipeline cleanup (#43325) by @Rocketknight1

    3D position IDs for vision-language models have been unified under a common interface (sourced from qwen2-vl), requiring users of affected VLMs (e.g., Ernie, GLM4V) to update their processors and any code that manually constructs position IDs.

    🚨 Unify 3D position ids (#43972) by @zucchini-nlp

    🚨 Tokenizer x vLLM fixes 🚨 :

    Unigram tokenizers were missing the spm precompiled charsmap support. We ran an overall v4 vs v5 regression test and fixed what we had missed.

    This was done in:

    [vllm + v5 fix] handle TokenizersBackend fallback properly for v5 (#44255) by @itazap

    Generation

    Generation input preparation was significantly refactored to stop relying on cache_position and instead pass pre-sliced input_ids/inputs_embeds directly to prepare_inputs_for_generation, simplifying the generation loop and laying groundwork for broader cache_position removal. Several bug fixes were also applied, including correct sampling for HiggsAudioV2, flaky cache-equality test stabilization for Idefics, and restored generation integration tests.

    [higgs-audio-v2] fix sampling (#44386) by @eustlb in [#44386]

    fix(flaky): idefics generate cache flake (#44180) by @tarekziade in [#44180]

    Fix generation integration tests (#44225) by @zucchini-nlp in [#44225]

    [generate] Always pass full input_ids in prepare_inputs_for_generation (#44226) by @Cyrilvallez in [#44226]

    fix: HiggsAudioV2 cached decode inputs in compiled generation (#44201) by @tarekziade in [#44201]

    [generate] Completely stop relying on cache_position to prepare inputs (#44130) by @Cyrilvallez in [#44130]

    Simplify input preparation in generate (#44126) by @Cyrilvallez in [#44126]

    Tokenization

    Several tokenization bugs were fixed in this release, including resolving an AttributeError in MLukeTokenizer caused by the v5 rename of additional_special_tokens, correcting the Fuyu tokenizer class mapping, fixing LayoutXLM tokenization test failures from the slow tokenizer removal refactor, and adding olmo_hybrid to the auto-tokenizer mapping. The tokenizer documentation was also updated to reflect the new unified v5 backend architecture and reorganized for clarity.

    [tiny] Add olmo_hybrid to tokenizer auto-mapping (#44416) by @tyler-romero in [#44416]

    fix(tokenizer): Fix MLukeTokenizer AttributeError post-v5 refactor (#44362) by @harshaljanjani in [#44362]

    update fuyu tokenizer class (#44235) by @itazap in [#44235]

    fix(testing): Fix LayoutXLM tokenization test and LightOnOCR SDPA flash test failures on main CI (#43988) by @harshaljanjani in [#43988]

    [docs] tokenizer summary (#43965) by @stevhliu in [#43965]

    [docs] refactor tokenizer docs (#43900) by @stevhliu in [#43900]

    Kernels

    Fixed several kernel-related issues including a security vulnerability, corrected Mamba kernel loading to handle incompatible import structures, ensured Liger Kernel is properly enabled during hyperparameter search, and expanded Flash Attention to support multiple compatible implementations.

    Fix kernels security issue (#44395) by @Cyrilvallez in [#44395]

    Enable Liger Kernel when doing hyperparameter search. (#44329) by @linfeng-du in [#44329]

    [Mamba] Fix kernel loading (#44176) by @vasqu in [#44176]

    [Flash Attn] Enable compatible implementations (#44177) by @vasqu in [#44177]

    Fix percentage formatting in help messages for gradient checkpointing, Liger Kernel, and empty cache steps (#44100) by @qgallouedec in [#44100]

    Quantization

    This release adds several new quantization backends and fixes, including MLX quantization support for MPS devices, Four Over Six (4/6) NVFP4 quantization integration for NVIDIA Blackwell GPUs, and CPU support for MXFP4 models, alongside a bug fix for MXFP4 model saving using reverse_op.

    [Quantization] Fixing mxfp4 saving using reverse_op (#43148) by @MekkCyber in [#43148]

    [Quantization] Add metal quantization for MPS devices! (#43934) by @MekkCyber in [#43934]

    Enable mxfp4 model on CPU (#43512) by @jiqing-feng in [#43512]

    Add Four Over Six quantization integration (#43970) by @jackcook in [#43970]

    Vision

    Fixed backward compatibility for image processors loaded from older remote code that lack valid_kwargs definitions, and resolved test failures in AMD ROCm CI by adding the missing timm dependency to the Docker image.

    [AMD CI] Add missing timm dependency to ROCm Docker image (#44389) by @Abdennacer-Badaoui in [#44389]

    update glm image model expected out for tests (#43907) by @kaixuanliu in [#43907]

    Fix image processors from_dict backward compatibility with old remote code (#44245) by @yonigozlan in [#44245]

    Bugfixes and improvements

    Update PR template (#44415) by @SunMarc in [#44415]

    Add Qwen3.5 support for sequence classification (#44406) by @medhakimbedhief in [#44406]

    update the expected output for qwen2_5_vl w/ pytorch 2.10 XPU (#44426) by @kaixuanliu in [#44426]

    add support for nemotron_3 (#44390) by @liding-nv in [#44390]

    [ Dynamic weight loader] fix remote code when format matches (#44396) by @ArthurZucker in [#44396]

    [timesfm2_5] fix timesfm2.5 loss (#44331) by @kashif in [#44331]

    Fix peft conversion mappings (#44413) by @Cyrilvallez in [#44413]

    Reduce tqdm verbosity during model loading (#44414) by @Cyrilvallez in [#44414]

    docs: Add NeMo Automodel community integration docs (#44304) by @adil-a in [#44304]

    [CB] Small fixes (#44227) by @remi-or in [#44227]

    Support non-gated experts (#44319) by @IlyasMoutawwakil in [#44319]

    [Bugfix] fix qwen3.5 no split module (#44382) by @JJJYmmm in [#44382]

    Fix mutable default arguments and resource leaks (#44287) by @jashshah999 in [#44287]

    skip 2 invalid test cases for voxtral_realtime model (#44321) by @kaixuanliu in [#44321]

    Mamba-1/-2 init weights in mixer class (#43778) by @kevinli573 in [#43778]

    add expectations for xpu for olmo_hybrid model (#44353) by @kaixuanliu in [#44353]

    [VITS] Add speaking_rate as an optionl forward argument (#43283) by @gau-nernst in [#43283]

    Strict export cleanup (#44293) by @IlyasMoutawwakil in [#44293]

    [docs] kernelconfig fix (#44337) by @stevhliu in [#44337]

    Add ProcessingKwargs ImagesKwargs etc. to docs (#44269) by @yonigozlan in [#44269]

    Fix typos in comments and docstrings (#44332) by @tysoncung in [#44332]

    Add testing guide for agents for trainer tests (#44328) by @SunMarc in [#44328]

    Update common tests Trainer (#44260) by @SunMarc in [#44260]

    [timesfm2_5] fix timesfm mlp bias (#44325) by @kashif in [#44325]

    fix zero3 init config (#44236) by @SunMarc in [#44236]

    Update expected output for Jais2 model tests (#43910) by @kaixuanliu in [#43910]

    Improve has_similar_generate_outputs assertions (#44166) by @tarekziade in [#44166]

    Fix failed test case for exaone_moe model (#43938) by @kaixuanliu in [#43938]

    fix(modeling_attn_mask_utils): remove FutureWarning from logger.warning_once() (#44307) by @imstevenpmwork in [#44307]

    Remove remaining vestiges of the TranslationPipeline (#43869) by @Rocketknight1 in [#43869]

    XPU now supports backward for the FA2 fixed path (#43905) by @YangKai0616 in [#43905]

    Fix: use TokenizersBackend for Olmo3 to preserve custom pre_tokenizer (#44294) by @mario-sanz in [#44294]

    Fix special token maps BC (#44281) by @ArthurZucker in [#44281]

    [Modular] Fix file type regression (#44283) by @vasqu in [#44283]

    [auto_docstring] Improve typing parsing and add tests (#43748) by @yonigozlan in [#43748]

    Restore response_schema saving-loading (#44282) by @Rocketknight1 in [#44282]

    Use associative scan HOP mamba recurrentgemma (#43737) by @riccardofelluga in [#43737]

    chore: fixes in Trainer class docs (compute_loss & hyperparameter_search) (#44268) by @ethanknights in [#44268]

    fix(trainer): pass optim_args to SGD, Adagrad, and RMSprop optimizers (#44203) by @nightcityblade in [#44203]

    fix(utils): Make torch_compilable_check compatible with torch.export strict mode (#44266) by @harshaljanjani in [#44266]

    Fix TypeError in convert_rope_params_to_dict when ignore_keys is a list (#44272) by @hangjun-ezra in [#44272]

    [docs] callbacks and collators (#44239) by @stevhliu in [#44239]

    [docs] trainer part 1 (#44185) by @stevhliu in [#44185]

    Remove refs to grouped_entities (#44182) by @Rocketknight1 in [#44182]

    [mimi] nit (#44237) by @eustlb in [#44237]

    Fix local dataset loading priority in run_image_classification_no_tra… (#44199) by @gowthamr-tech in [#44199]

    chore: added CLAUDE.md alias (#44232) by @tarekziade in [#44232]

    fix: add missing return type annotations to type-checking utilities in generic.py (#44241) by @yushiran in [#44241]

    Fix return value - fixes #44238 (#44240) by @tarekziade in [#44240]

    fix regression report_to "all" (#44250) by @SunMarc in [#44250]

    [fix] Set input_modalities on various architectures that aren't just text (#44078) by @tomaarsen in [#44078]

    Add processing tests for phi4 multimodal (#44234) by @yonigozlan in [#44234]

    fix: VersionComparison.from_string return type mismatch (#43709) by @tarekziade in [#43709]

    refactor _inner_training_loop to smaller methods (#44041) by @winglian in [#44041]

    [docs] fix broken chat_templating links in tasks docs (#44115) by @Deep-unlearning in [#44115]

    Add missing backtick in AnyToAnyPipeline.call docstring (#44229) by @alvarobartt in [#44229]

    Docs(it): fix typo in sentencepiece install command (#44218) by @matisgagneux21 in [#44218]

    Docs(it): fix typo in docstring wording (#44219) by @matisgagneux21 in [#44219]

    fix bug with position_ids on qwen3-vl models, such that position_ids include text position (#44158) by @leopold-tzafon in [#44158]

    Update 404ing BillSum dataset URL on Summarization Task guide (#44212) by @alexandercarruthers in [#44212]

    fix(models): Fix LayoutLMv2 NER crash and broken batched truncation/padding (#44187) by @harshaljanjani in [#44187]

    [CB] [Major] Asynchronous batching (#43960) by @remi-or in [#43960]

    Fix LASR feature extractor regression from invalid center argument (#44207) by @ainergiz in [#44207]

    Models with incorrect tokenizer_class in tokenization_config.json tha… (#44179) by @itazap in [#44179]

    chore(typing): initial ty integration (#44167) by @tarekziade in [#44167]

    fix(flaky): test_generate_with_and_without_position_ids in GLM ORC (#44173) by @tarekziade in [#44173]

    [docs] Add Chinese translations for common NLP task tutorials (#44144) by @TinderZ in [#44144]

    [Mimi] Calibrate to ensure encoder streaming performs correctly (#43971) by @caffeinism in [#43971]

    ESM2 attention_mask and token_dropout fix (#44163) by @lhallee in [#44163]

    bring back our demons: clean_up_tokenization_spaces (#44035) by @ArthurZucker in [#44035]

    Fix Seq2SeqTrainingArguments documentation (#35258) by @qgallouedec in [#35258]

    AutoGrad support for grouped_mm fallback (#44152) by @IlyasMoutawwakil in [#44152]

    Patch setitem on ModelOutput even if the parameter was previously None (#44080) by @tomaarsen in [#44080]

    [simple] Fix up repr whitespace/brackets (#44048) by @tomaarsen in [#44048]

    [chore] Fix incorrect forward type hint for Gemma3n (#44051) by @tomaarsen in [#44051]

    Raise informative error when loading video processors (#44125) by @zucchini-nlp in [#44125]

    fix(flaky): Different approach to make sure loss exists (#43804) by @tarekziade in [#43804]

    [voxtral] fix voxtral proc (#44132) by @eustlb in [#44132]

    [docs] Fix typos in GenerationConfig docstring (#44143) by @nightcityblade in [#44143]

    Fix gemma3n get_audio_features (#44040) by @zucchini-nlp in [#44040]

    Fix UMT5EncoderModel embedding weights not being tied after loading (#43880) by @jiqing-feng in [#43880]

    fix(testing): Update stale device override test in GraniteSpeech (#44113) by @harshaljanjani in [#44113]

    [Misc][vlms] Use text_config when initializing the fine-grained FP8Expert (#44032) by @JJJYmmm in [#44032]

    docs: fix typo 'AuoQuant' → 'AutoQuant' and clarify FINEGRAINED_FP8 library column (#44131) by @cluster2600 in [#44131]

    Update post proc (#44090) by @itazap in [#44090]

    Fix: flaky Kosmos2ModelTest test (#44061) by @tarekziade in [#44061]

    AutoTokenizer ignores config when model_type is None (#44127) by @itazap in [#44127]

    Migrate GPT2 to standardized output capture decorators (#43983) by @Aki-07 in [#43983]

    grouped_mm fallback (#44043) by @IlyasMoutawwakil in [#44043]

    Bump dev version (#44099) by @qgallouedec in [#44099]

    Fix loading logic issue (#44095) by @Cyrilvallez in [#44095]

    [docs] customizing tokenizers (#43929) by @stevhliu in [#43929]

    Merge test_keep_in_fp32_modules and test_keep_in_fp32_modules_strict (#44097) by @Rocketknight1 in [#44097]

    [voxtral-realtime] update runner expected values (#44096) by @eustlb in [#44096]

    Use torch.isfinite (#44069) by @cyyever in [#44069]

    add default flash impl (#44081) by @ArthurZucker in [#44081]

    Remove unused dependencies (#43904) by @cyyever in [#43904]

    Fix patchtsmixer call to post_init (#44082) by @Cyrilvallez in [#44082]

    Fix false positive right-padding warning for decoder-only models in pipeline (#44021) by @ in [#44021]

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    @ArthurZucker

    Add eurobert (#39455)

    [ Dynamic weight loader] fix remote code when format matches (#44396)

    Fix special token maps BC (#44281)

    bring back our demons: clean_up_tokenization_spaces (#44035)

    add default flash impl (#44081)

    @liding-nv

    add support for nemotron_3 (#44390)

    @kashif

    [timesfm2_5] fix timesfm2.5 loss (#44331)

    [timesfm2_5] fix timesfm mlp bias (#44325)

    Timesfm 2.5 (#41763)

    @remi-or

    [CB] Small fixes (#44227)

    [CB] [Major] Asynchronous batching (#43960)

    @ebezzam

    [VibeVoice ASR] Use updated padding cache for ASR model. (#44392)

    Add VibeVoice ASR (#43625)

    @MekkCyber

    [Quantization] Fixing mxfp4 saving using reverse_op (#43148)

    [Quantization] Add metal quantization for MPS devices! (#43934)

    @tarekziade

    perf: Optimize SynthID logits processor batch index construction (#44172)

    Improve has_similar_generate_outputs assertions (#44166)

    fix(flaky): idefics generate cache flake (#44180)

    chore: added CLAUDE.md alias (#44232)

    Fix return value - fixes #44238 (#44240)

    fix: VersionComparison.from_string return type mismatch (#43709)

    fix: HiggsAudioV2 cached decode inputs in compiled generation (#44201)

    chore(typing): initial ty integration (#44167)

    fix(flaky): test_generate_with_and_without_position_ids in GLM ORC (#44173)

    fix(flaky): Different approach to make sure loss exists (#43804)

    Fix: flaky Kosmos2ModelTest test (#44061)

    @zhang-prog

    [Model] Add PP-DocLayoutV2 Model Support (#43018)

    @yanhong-lbh

    Add OLMo Hybrid model (#43358)

    @vasqu

    🚨 [Ernie 4.5 VL Moe] Fix up namings to vllm/sglang convention (#44299)

    [Modular] Fix file type regression (#44283)

    [Mamba] Fix kernel loading (#44176)

    [Flash Attn] Enable compatible implementations (#44177)

    @jackcook

    Add Four Over Six quantization integration (#43970)

    @winglian

    refactor _inner_training_loop to smaller methods (#44041)

    @paultltc

    Add ModernVBERT models (#42504)

    @TinderZ

    [docs] Add Chinese translations for common NLP task tutorials (#44144)

    @szhengac

    Add Higgs Audio V2 Model (#40294)

    Original source Report a problem
  • Mar 3, 2026
    • Date parsed from source:
      Mar 3, 2026
    • First seen by Releasebot:
      Mar 20, 2026
    Modal logo

    Modal

    1.3.5 (2026-03-03)

    Modal adds a changelog CLI, programmatic Secret updates, and richer function stats for tracking running inputs, making it easier to surface release information and manage live workloads.

    • We’ve added a modal changelog CLI for retrieving changelog entries with a flexible query interface (e.g. modal changelog --since=1.2 , modal changelog --since=2025-12-01 , modal changelog --newer ). We expect that this will be a useful way to surface information about new features to coding agents.

    • We’ve added a new modal.Secret.update method, which allows you to programmatically modify the environment variables within a Secret. This method has the semantics of Python’s dict.update : Secret contents can be overwritten or extended when using it. Note that Secret updates will take effect only for containers that start up after the modification.

    • The dataclass returned by modal.Function.get_current_stats() now includes a num_running_inputs field that reports the number of inputs the Function is currently handling.

    Original source Report a problem
  • Feb 25, 2026
    • Date parsed from source:
      Feb 25, 2026
    • First seen by Releasebot:
      Mar 20, 2026
    Together AI logo

    Together AI

    Feb 25

    Together AI deprecates multiple models, including FLUX, Qwen, Llama, and Nemotron variants.

    Model Deprecations

    The following models have been deprecated and are no longer available:

    • black-forest-labs/FLUX.1-dev
    • black-forest-labs/FLUX.1-dev-lora
    • black-forest-labs/FLUX.1-kontext-dev
    • Qwen/Qwen3-VL-32B-Instruct
    • mistralai/Ministral-3-14B-Instruct-2512
    • Qwen/Qwen3-Next-80B-A3B-Thinking
    • Alibaba-NLP/gte-modernbert-base
    • BAAI/bge-base-en-v1.5
    • meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
    • meta-llama/Llama-Guard-3-11B-Vision-Turbo
    • meta-llama/LlamaGuard-2-8b
    • marin-community/marin-8b-instruct
    • nvidia/NVIDIA-Nemotron-Nano-9B-v2
    Original source Report a problem
  • Feb 23, 2026
    • Date parsed from source:
      Feb 23, 2026
    • First seen by Releasebot:
      Mar 20, 2026
    Modal logo

    Modal

    1.3.4 (2026-02-23)

    Modal adds Directory Snapshots beta for persisting sandbox directories across sessions, plus Sandbox.detach(), a wait option for terminate(), 8x faster stdin writes for Sandbox exec, and Volume.from_id() for referencing volumes by object id.

    • We’re introducing “Directory Snapshots”: a new beta feature for persisting specific directories past the lifetime of an individual Sandbox. Using the new methods modal.Sandbox.snapshot_directory() and modal.Sandbox.mount_image(), you can capture the state of a directory and then later include it in a different Sandbox:
    sb = modal.Sandbox.create(app=app)
    snapshot = sb.snapshot_directory("/project")
    sb2 = modal.Sandbox.create(app=app)
    sb2.mount_image("/project", snapshot)
    

    This feature can be useful for separating the lifecycle of application code in the Sandbox’s main Image from project code that changes in each Sandbox session. Files in the mounted snapshot also benefit from several optimizations that allow them to be read faster. See the Sandbox Snapshot guide for more information.

    • We’ve added a new modal.Sandbox.detach() method that we recommend calling after you are done interacting with a Sandbox. This method disconnects your local client from the Sandbox and cleans up resources associated with the connection. After calling detach, operations on the Sandbox object may raise and are otherwise not guaranteed to work.

    • The modal.Sandbox.terminate() method now accepts a wait parameter. With wait=True, terminate will block until the Sandbox is finished and return the exit code. The default wait=False maintains the previous behavior.

    • Throughput for writing to the stdin of a modal.Sandbox.exec process has been increased by 8x.

    • We’ve added a new modal.Volume.from_id() method for referencing a Volume by its object id.

    Original source Report a problem
  • Feb 17, 2026
    • Date parsed from source:
      Feb 17, 2026
    • First seen by Releasebot:
      Mar 20, 2026
    Hugging Face logo

    transformers by Hugging Face

    v5.2.0: GLM-5, Qwen3.5, Voxtral Realtime, VibeVoice Acoustic Tokenizer

    transformers releases VoxtralRealtime, GLM-5, Qwen3.5 and VibeVoice support, bringing new streaming speech, multimodal and large-scale model additions plus a breaking new attention mask interface and broad bug fixes.

    New Model additions

    VoxtralRealtime

    VoxtralRealtime is a streaming speech-to-text model from Mistral AI, designed for real-time automatic speech recognition (ASR). Unlike the offline Voxtral model which processes complete audio files, VoxtralRealtime is architected for low-latency, incremental transcription by processing audio in chunks as they arrive.

    The model combines an audio encoder with a Mistral-based language model decoder, using time conditioning embeddings and causal convolutions with padding caches to enable efficient streaming inference.

    Add Voxtral Realtime (#43769) by @eustlb

    GLM-5 - GlmMoeDsa

    The zAI team launches GLM-5, and introduces it as such:

    GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.

    Reinforcement learning aims to bridge the gap between competence and excellence in pre-trained models. However, deploying it at scale for LLMs is a challenge due to the RL training inefficiency. To this end, we developed slime, a novel asynchronous RL infrastructure that substantially improves training throughput and efficiency, enabling more fine-grained post-training iterations. With advances in both pre-training and post-training, GLM-5 delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models in the world on reasoning, coding, and agentic tasks, closing the gap with frontier models.

    Add GlmMoeDsa (#43858) by @Cyrilvallez

    Qwen3.5, Qwen3.5 Moe

    The Qwen team launches Qwen 3.5, and introduces it as such:

    We are delighted to announce the official release of Qwen3.5, introducing the open-weight of the first model in the Qwen3.5 series, namely Qwen3.5-397B-A17B. As a native vision-language model, Qwen3.5-397B-A17B demonstrates outstanding results across a full range of benchmark evaluations, including reasoning, coding, agent capabilities, and multimodal understanding, empowering developers and enterprises to achieve significantly greater productivity. Built on an innovative hybrid architecture that fuses linear attention (via Gated Delta Networks) with a sparse mixture-of-experts, the model attains remarkable inference efficiency: although it comprises 397 billion total parameters, just 17 billion are activated per forward pass, optimizing both speed and cost without sacrificing capability. We have also expanded our language and dialect support from 119 to 201, providing broader accessibility and enhanced support to users around the world.

    Adding Support for Qwen3.5 (#43830) by @bozheng-hit

    VibeVoice Acoustic Tokenizer

    VibeVoice is a novel framework for synthesizing high-fidelity, long-form speech with multiple speakers by employing a next-token diffusion approach within a Large Language Model (LLM) structure. It's designed to capture the authentic conversational "vibe" and is particularly suited for generating audio content like podcasts and multi-participant audiobooks.

    One key feature of VibeVoice is the use of two continuous audio tokenizers, one for extracting acoustic features and another for semantic features.

    Add VibeVoice Acoustic Tokenizer (#43400) by @ebezzam

    Breaking changes

    🚨 [Attn] New attn mask interface everywhere (#42848)

    🚨 Modify ModernBERT's default attention implementation to stop using FA (#43764)

    🚨 This one is quite breaking for super super super old modles: 🚨 🚨

    fix: Prevent AutoTokenizer type mismatch from directory name substrin… (#43791)

    If the config does not have a model-type field, we no longer check the name of the folder like for https://huggingface.co/prajjwal1/bert-tiny/blob/main/config.json

    Bugfixes and improvements

    • [docs] deploying (#43241) by @stevhliu
    • [Trainer] Move NEFTune impl to standalone functions (#43714) by @SunMarc
    • Fix convert_rope_params_to_dict so it uses rope_theta from the config (#43766) by @hmellor
    • Bump dev version (#43777) by @qgallouedec
    • Improved AGENTS.md (#43763) by @tarekziade
    • Fix-release-ubild (#43773) by @ArthurZucker
    • unpin torch for CircleCI (#43790) by @ydshieh
    • [Modular Dependencies] Fixup qwen rms norms (#43772) by @vasqu
    • fix(testing): Fix BLOOM tokenizer, CLAP audio features, and CLVP text tester usage in tests (#43798) by @harshaljanjani
    • Remove unconditional train_batch_size assignment (#43770) by @lordaarush
    • [Repo Consistency] Fix rms norm (#43803) by @vasqu
    • fix: Prevent AutoTokenizer type mismatch from directory name substrin… (#43791) by @tarekziade
    • Refactor trainer data_collator and callbacks tests (#43776) by @SunMarc
    • [core] Faster and thread-safe check_model_inputs implementation (#43765) by @Cyrilvallez
    • [Trainer] use deepspeed SP process group when Accelerate doesn’t build a mesh (#43799) by @kashif
    • fix(flaky): enforce manual seed to reduce flakiness (#43794) by @tarekziade
    • Add TRL CI bot workflow to trigger tests on PR comments (#43809) by @qgallouedec
    • Fix DeepSpeed model preparation logic in Trainer class (#43780) by @qgallouedec
    • [docs] reveal more in toctree (#43808) by @stevhliu
    • Fix markdown documentation (#43076) by @cyyever
    • Fix slack-report workflow file (#43851) by @ydshieh
    • add do_sample=False to qwen2_5_vl model tests to stablize the output (#43728) by @kaixuanliu
    • Fix incorrect timestamp calculation in Qwen3VL Processor (#43659) by @jonathan-fulton
    • Remove GPU tracking from TrackioCallback and remove env var support (#43371) by @qgallouedec
    • Add id and resume support to SwanLab integration (#43719) by @i-pj
    • fix gptoss crash in tp (#43853) by @sywangyi
    • Delete batch_split from EncoderDecoderCache (#43814) by @cyyever
    • delete unnecessary code to make moe compatible to full graph compile (#43855) by @kaixuanliu
    • Update ModelType for Unigram tokenizer (#43860) by @pavel-esir
    • [docs] Remove pipeline() examples from summarization/translation tasks (#43831) by @Mr-Neutr0n
    • Fix video interpolation in pe_audio_video (#43811) by @Rocketknight1
    • Look for the pad_token_id in the right place for Llama4 (#43539) by @Rocketknight1
    • Fix cardinality error for DETR models without explicit background class (#43513) by @heathdutton
    • docs: Add Switch Transformers docstring notes and update spectrogram comment (#43336) by @harshaljanjani
    • [xLSTM] Fix bugs preventing small model training (#43209) by @Anri-Lombard
    • docs: correct typo 'neccessary' to 'necessary' (#43868) by @thecaptain789
    • Improve PR comment CI feedback (#43852) by @ydshieh
    • Fix init weights in remote code (#43768) by @zucchini-nlp
    • Fix GlmMoeDsaConfig default mlp_layer_types in modular conversion (#43876) by @OiPunk
    • [MistralCommonBackend] fix loading proc (#43887) by @eustlb
    • [Jamba] Fallback to slow path and warn instead of error out (#43889) by @vasqu
    • Fix SwanLab callback to forward resume init args (#43848) by @OiPunk
    • Fix old tech stack in doc (#43879) by @cyyever
    • Update TrainingArguments (#43806) by @SunMarc
    • Remove unnecessary code or checks for PT 2.4+ (#43787) by @cyyever
    • Make it possible to evaluate when using sequence parallel in HF Trainer (#43517) by @jp1924
    • [Trainer] Move optimizer cls init to trainer_optimizer.py (#43738) by @SunMarc
    • fix the error of tests/quantization/fbgemm_fp8/test_fbgemm_fp8.py::Fb… (#43547) by @sywangyi
    • fix fbgemm fp8 multi-device load failure. (#43581) by @sywangyi
    • Refactor trainer init (#43807) by @SunMarc
    • [fix] Use last_hidden_state key from get_image_features for llama4 (#43882) by @tomaarsen
    • [Docs] Add docs for GLM-OCR and fix EomT-DINOv3 (#43710) by @NielsRogge
    • Update hub metadata (#43892) by @zucchini-nlp
    • [fix] DAC model: Apply STE in Dac.from_latents to match the forward pass (#43820) by @harshaljanjani
    • Separate check_model_inputs into capture_outputs and merge_with_config_defaults + ensure correctness (#43862) by @Cyrilvallez
    • Remove mask slicing in all eager attentions (#42186) by @Cyrilvallez
    • Fix expected DAC outputs due to (old) change in CI settings. (#43896) by @ebezzam
    • Minor changes trainer (#43744) by @SunMarc
    • adding BC for custom toks accessing slow tok attrs deprecated in v5 (#43898) by @itazap
    • Fix typo in quantization_operations in PEFT integrations (#43821) by @redpanda1995
    • Update KERNELS_MIN_VERSION to 0.10.2 to be the same as setup.py (#43753) by @cyyever
    • Decorate cache updates with no_grad, just in case (#43897) by @Rocketknight1
    • revert place_model_on_device to property (#43895) by @SunMarc
    • Train sampler unification (#43138) by @jiosephlee
    • fix(moe): Handle dtype mismatch in torch._grouped_mm with autocast (#43839) by @Mr-Neutr0n
    • Fix missing fast image patch counter in Glm46V (#43877) by @OiPunk
    • Fix old tech stack in doc (#43902) by @cyyever
    • Move _keys_to_ignore_on_load_missing for now (#43893) by @ArthurZucker
    • Changes to cache_utils should trigger all tests all the time (#43920) by @Cyrilvallez
    • Ernie4 5 vl moe (#43755) by @kaixuanliu
    • Harmonize input_embeds to inputs_embeds everywhere (#43916) by @Cyrilvallez
    • fix: TextClassificationPipeline docs mentioning deprecated return_all_scores (#43903) by @math-hiyoko
    • Revert #43897 (#43923) by @Rocketknight1
    • Fix AttributeError in OwlViT conversion script for Python 3.10+ (#43922) by @DimiChatzipavlis
    • add openAI style image_url content support in apply_chat_template (#43786) by @kaixuanliu
    • Prepare and keep track of position ids in generate (#43734) by @zucchini-nlp
    • Fix lifted_tensor in Gemma3n export which dynamo can't reason about (#43801) by @robell
    • Fix bark test (#43942) by @Cyrilvallez
    • Fix docker files (#43946) by @ydshieh
    • Fix flaky test for multimodal LLMs (#43944) by @Rocketknight1
    • Add explicit utf-8 encoding to CircleCI scripts for Windows compatibility (#43925) by @
    • Modernize string formatting (f-strings) in conversion scripts (#43943) by @
    • Fix weight decay exclusions in run_*_no‑trainer.py examples (#42769) by @casinca
    • fix: Better weight decay exclusion in run_*_no‑trainer.py examples (#43947) by @casinca
    • Timm backbone saves and loads out_features (#43886) by @zucchini-nlp
    • Fix qwen-vl position ids when generating several times (#43952) by @zucchini-nlp
    • Fix get_number_of_image_tokens (#43948) by @zucchini-nlp
    • Fix typos in docstrings, comments, and error messages (#43949) by @
    • Fix LASR test layerdrop issue (#43954) by @Rocketknight1
    • [kernels] fix kernel versions (#43955) by @MekkCyber
    • [Doc tests] Fix bug (#43729) by @NielsRogge
    • fix(models): Preserve custom token IDs through DiaConfig save and load (#43928) by @harshaljanjani
    • update somes audio models (#43865) by @Deep-unlearning
    • Improve memory allocator during loading (#43945) by @Cyrilvallez
    • Inclusion of process_group in the gather_full_tensor function in tensor_parallel.py (#43932) by @quic-meetkuma
    • Fix sync gradient (#43919) by @SunMarc
    • Reorder Trainer methods (#43914) by @SunMarc
    • Fix TypeError in dot_natural_key when state_dict keys have mixed types at same position (#43966) by @shtse8
    • Enhance JSON schema generation to support instance, static, and class methods (#43968) by @qgallouedec
    • Remove unused squeeze from VJEPA2 embeddings rotation (#43984) by @materight
    • Improve new failing test analysis for PR comment CI (#44033) by @ydshieh
    • Remove other_workflow_run_ids for issue_comment in utils/notification_service.py (#44036) by @ydshieh
    • stable grouped_mm API (#43977) by @IlyasMoutawwakil
    • create .git-blame-ignore-revs file (#43982) by @SunMarc
    • docs: fix typos across documentation files (#43993) by @saurav0369
    • update python requirement to 3.10+ to match codebase (#44009) by @mariam851
    • Improve use of torch.is_autocast_enabled (#43930) by @cyyever
    • Use torch.xlogy (#44006) by @cyyever
    • [Deespeed] fix WeightConverter.convert() use (#43926) by @kashif
    • Reduce reduce CUDA sync (#44005) by @cyyever
    • split out accelerator args builder method (#43987) by @winglian
    • SINQ quantization strategy integration (adapted for Transformers V5) (#43112) by @ChiaraBoretti
    • fix(models): Unpack BitNet packed weights to fix CI failure (#43721) by @harshaljanjani

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    • @ChiaraBoretti
      • SINQ quantization strategy integration (adapted for Transformers V5) (#43112)
    • @cyyever
      • Reduce reduce CUDA sync (#44005)
      • Use torch.xlogy (#44006)
      • Improve use of torch.is_autocast_enabled (#43930)
      • Fix old tech stack in doc (#43902)
      • Update KERNELS_MIN_VERSION to 0.10.2 to be the same as setup.py (#43753)
      • Remove unnecessary code or checks for PT 2.4+ (#43787)
      • Fix old tech stack in doc (#43879)
      • Delete batch_split from EncoderDecoderCache (#43814)
      • Fix markdown documentation (#43076)
    • @eustlb
      • Add Voxtral Realtime (#43769)
      • [MistralCommonBackend] fix loading proc (#43887)
    • @ebezzam
      • Fix expected DAC outputs due to (old) change in CI settings. (#43896)
      • Add VibeVoice Acoustic Tokenizer (#43400)
    • @vasqu
      • [Jamba] Fallback to slow path and warn instead of error out (#43889)
      • 🚨 [Attn] New attn mask interface everywhere (#42848)
      • [Repo Consistency] Fix rms norm (#43803)
      • [Modular Dependencies] Fixup qwen rms norms (#43772)
    • @bozheng-hit
      • Adding Support for Qwen3.5 (#43830)
    Original source Report a problem
  • Feb 16, 2026
    • Date parsed from source:
      Feb 16, 2026
    • First seen by Releasebot:
      Mar 20, 2026
    Together AI logo

    Together AI

    Feb 16

    Together AI adds serverless model bring ups for Qwen/Qwen3.5-397B-A17B.

    Serverless Model Bring Ups

    The following models have been added:

    • Qwen/Qwen3.5-397B-A17B
    Original source Report a problem