AI/ML Infrastructure Release Notes

Release notes for AI compute platforms, inference clouds and ML tooling

Get this feed:

Products (14)

Latest AI/ML Infrastructure Updates

  • May 1, 2026
    • Date parsed from source:
      May 1, 2026
    • First seen by Releasebot:
      May 1, 2026
    Hugging Face logo

    diffusers by Hugging Face

    Diffusers 0.38.0: New image and audio pipelines, Core library improvements, and more

    diffusers releases a major expansion with new pipelines for LLaDA2, NucleusMoE, ERNIE-Image, LongCat-AudioDiT, and ACE-Step, plus FLUX.2 decoder and inpaint support, modular pipeline updates, and core performance and attention backend improvements.

    New Pipelines

    LLaDA2

    LLaDA2 is a family of discrete diffusion language models that generate text through block-wise iterative refinement. Instead of autoregressive token-by-token generation, LLaDA2 starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement steps.

    PR: #13226

    Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/llada2

    Nucleus-MoE

    NucleusMoE-Image is a 2B active 17B parameter model trained with efficiency at its core. Our novel architecture highlights the scalability of a sparse MoE architecture for Image generation.

    PR: #13317

    Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/nucleusmoe_image

    Thanks to @sippycoder for the contribution.

    Ernie-Image

    ERNIE-Image is a powerful and highly efficient image generation model with 8B parameters.

    PR: #13432

    Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/ernie_image

    Thanks to @HsiaWinter for the contribution.

    LongCat-AudioDiT

    LongCat-AudioDiT is a text-to-audio diffusion model from Meituan LongCat.

    PR: #13483

    Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/longcat_audio_dit

    Thanks to @RuixiangMa for the contribution.

    Ace-Step 1.5

    ACE-Step 1.5 generates variable-length stereo audio at 48 kHz (10 seconds to 10 minutes) from text prompts and optional lyrics. The full system pairs a Language Model planner with a Diffusion Transformer (DiT) synthesizer; this pipeline wraps the DiT half of that stack, and consists of three components: an AutoencoderOobleck VAE that compresses waveforms into 25 Hz stereo latents, a Qwen3-based text encoder for prompt and lyric conditioning, and an AceStepTransformer1DModel DiT that operates in the VAE latent space using flow matching.

    PR: #13095

    Docs: https://huggingface.co/docs/diffusers/main/api/pipelines/ace_step

    Thanks to @ChuxiJ for the contribution.

    Flux.2 Small Decoder

    Make your Flux.2 decoding faster with this new small decoder model from the Black Forest Labs. You can check it out here. It was contributed by @huemin-art in this PR.

    Modular Pipeline Support

    We added modular support for LTX-2 and Hunyuan 1.5.

    Core Library

    Flash Attention 4 backend

    FlashPack loading

    Group offloading + TorchAO

    ring_anything as a new CP backend

    Profiling pipelines in Diffusers

    All commits

    [Discrete Diffusion] Add LLaDA2 pipeline by @kashif in #13226

    [LLADA2] documentation fixes by @kashif in #13333

    [ci] claude in ci. by @sayakpaul in #13297

    [docs] kernels by @stevhliu in #13139

    [tests] Tests for conditional pipeline blocks by @sayakpaul in #13247

    avoid hardcode device in flux-control example by @kaixuanliu in #13336

    fix claude workflow to include id-token with write. by @sayakpaul in #13338

    Update LTX-2 Docs to Cover LTX-2.3 Models by @dg845 in #13337

    remove str option for quantization config in torchao by @howardzhang-cv in #13291

    [ci] include checkout step in claude review workflow by @sayakpaul in #13352

    change minimum version guard for torchao to 0.15.0 by @howardzhang-cv in #13355

    [ci] move to assert instead of self.Assert* by @sayakpaul in #13366

    [docs] refactor model skill by @stevhliu in #13334

    Fix Ulysses SP backward with SDPA by @zhtmike in #13328

    Add train flux2 series lora config by @tcaimm in #13011

    [docs] Add NeMo Automodel training guide by @pthombre in #13306

    Fix: ensure consistent dtype and eval mode in pipeline save/load tests by @YangKai0616 in #13339

    [ci] support claude reviewing on forks. by @sayakpaul in #13365

    Fix MotionConv2d to cast blur_kernel to input dtype instead of reverse by @YangKai0616 in #13364

    chore: update claude_review.yml by @hf-security-analysis[bot] in #13374

    corrects single file path validation logic by @andrew-w-ross in #13363

    [docs] deprecate pipelines by @stevhliu in #13157

    🔒 Pin GitHub Actions to commit SHAs by @paulinebm in #13385

    [docs] add auto docstring and parameter templates documentation for m… by @yiyixuxu in #13382

    Fix typos and grammar errors in documentation by @GalacticAvenger in #13391

    fix(ddim): validate eta is in [0, 1] in DDIMPipeline by @NIK-TIGER-BILL in #13367

    Fix Dynamo lru_cache warnings during torch.compile by @jiqing-feng in #13384

    [tests] refactor wan autoencoder tests by @sayakpaul in #13371

    NucleusMoE-Image by @sippycoder in #13317

    Add examples on how to profile a pipeline by @sayakpaul in #13356

    Update README.md of the profiling guide by @sayakpaul in #13400

    [CI] Refactor Cosmos Transformer Tests by @DN6 in #13335

    [tests] refactor autoencoderdc tests by @sayakpaul in #13369

    [CI] Hunyuan Transformer Tests Refactor by @DN6 in #13342

    Fix VAE offload encode device mismatch in DreamBooth scripts by @azolotenkov in #13417

    Remove references to torchao's AffineQuantizedTensor by @andrewor14 in #13405

    [tests] fix autoencoderdc tests by @sayakpaul in #13424

    [core] fix group offloading when using torchao by @sayakpaul in #13276

    Fix IndexError in HunyuanVideo I2V pipeline by @kaixuanliu in #13244

    improve Claude CI by @yiyixuxu in #13397

    FLUX.2 small decoder by @huemin-art in #13428

    [CI] Add PR/Issue Auto Labeler by @DN6 in #13380

    [CI] Add GLM Image Transformer Model Tests by @DN6 in #13344

    [CI] Use finegrained token for Issue Labeler by @DN6 in #13433

    Handle prompt embedding concat in Qwen dreambooth example by @chenyangzhu1 in #13387

    fix(qwen-image dreambooth): correct prompt embed repeats when using --with_prior_preservation by @chenyangzhu1 in #13396

    Cache RoPE freqs on device to avoid repeated CPU-GPU copy in QwenImage by @akshan-main in #13406

    [tests] tighten dependency testing. by @sayakpaul in #13332

    Fix grammar in LoRA documentation by @Xyc2016 in #13423

    Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution i… by @akshan-main in #13440

    [modular] Add LTX Video modular pipeline by @akshan-main in #13378

    Add ernie image by @HsiaWinter in #13432

    [core] fix fa4 integration by @sayakpaul in #13443

    FlashPack by @hlky in #12700

    [ptxla] fix pytorch xla inference on TPUs. by @entrpn in #13463

    fix some dtype issue for gguf / some gpu backends by @HsiaWinter in #13464

    Fix Qwen Image DreamBooth prior preservation batch ordering by @azolotenkov in #13441

    [tests] fix deprecated attention processor testing. by @sayakpaul in #13469

    [tests] xfail clip related issues. by @sayakpaul in #13454

    [agent] add modular doc by @yiyixuxu in #13410

    [tests] fix training tests by @sayakpaul in #13442

    fix(profiling): preserve instance isolation when decorating methods by @Akash504-ai in #13471

    [Feat] Adds LongCat-AudioDiT pipeline by @RuixiangMa in #13390

    Fix Flux2 DreamBooth prior preservation prompt repeats by @azolotenkov in #13415

    chore: bump doc-builder SHA for PR upload workflow by @rtrompier in #13476

    Remove compile bottlenecks from ZImage pipeline by @hitchhiker3010 in #13461

    [chore] Add diffusers-format example to LongCatAudioDiTPipeline by @RuixiangMa in #13483

    [core] fix autoencoderkl qwenimage for xla by @sayakpaul in #13480

    add PR fork workable by @paulinebm in #13438

    Add modular pipeline for HunyuanVideo 1.5 by @akshan-main in #13389

    [agents docs] add float64 gotcha by @yiyixuxu in #13472

    fix(ernie-image): avoid locals() comprehension scope issue in callback kwargs by @songh11 in #13478

    [Bugfix] Fix shape mismatch in LongCatAudioDiTTransformer conversion by @RuixiangMa in #13494

    feat: bump safetensors to 0.8.0-rc.0 by @McPatate in #13470

    fix(qwen): fix CFG failing when passing neg prompt embeds with none mask by @Sunhill666 in #13379

    add an example of spmd for flux on v5e-8 by @sayakpaul in #13474

    Add FLUX.2 Klein Inpaint Pipeline by @adi776borate in #13050

    [docs] add a mention of torchao and other backends in speed memory docs. by @sayakpaul in #13499

    Fix Flux2 non-diffusers guidance LoRA conversion by @yadferhad in #13486

    add _native_npu_attention support mask shape like [B,1,1,S] by @chang-zhijie in #13490

    fix(freeu): run FFT in float32 for float16 inputs to avoid ComplexHalf by @Ricardo-M-L in #13503

    Fix non-deterministic T5 outputs in HiDream pipeline tests by @kaixuanliu in #13534

    Fix AuraFlow attn processors applying norm_added_q to key projection by @Ricardo-M-L in #13533

    add _repeated_blocks for ErnieImageTransformer2DModel by @kaixuanliu in #13496

    [CI] Fix BnB tests by @DN6 in #13481

    [tests] fix group offloading with disk tests by @sayakpaul in #13491

    [ci] feat: have pr labeler label for closing issues. by @sayakpaul in #13548

    Improve trust_remote_code by @hlky in #13448

    chore: bump doc-builder SHA for main doc build workflow by @rtrompier in #13555

    [ci] simplify release workflow. by @sayakpaul in #13329

    [attention backends] fix ring CP for flash and flash 3 by @sayakpaul in #13182

    [agents docs] add pipelines.md etc by @yiyixuxu in #13567

    Add Ernie-Image modular pipeline by @akshan-main in #13498

    [agents docs] update modular.md by @yiyixuxu in #13568

    [docs] fix typo in AutoencoderOobleck docs by @ivnvalex in #13642)

    Fix ErnieImagePipeline pre-computed prompt_embeds + num_images_per_prompt shape mismatch by @Ricardo-M-L in #13532

    feat: support ring attention with arbitrary KV sequence lengths by @songh11 in #13545

    [ci] use tokenizers stable installtion in CI. by @sayakpaul in #13562

    NucleusMoE docs by @sayakpaul in #13661

    Fix UniPC scheduler device mismatch when using offloading by @ParamChordiya in #13489

    [Ernie-Image] Add lora support by @asomoza in #13575

    Add ACE-Step pipeline for text-to-music generation by @ChuxiJ in #13095

    Fix missing latents_bn_std dtype cast in VAE normalization by @adi776borate in #13299

    Release: v0.38.0-release by @sayakpaul (direct commit on v0.38.0-release)

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    @kashif

    [Discrete Diffusion] Add LLaDA2 pipeline (#13226)

    [LLADA2] documentation fixes (#13333)

    @howardzhang-cv

    remove str option for quantization config in torchao (#13291)

    change minimum version guard for torchao to 0.15.0 (#13355)

    @sippycoder

    NucleusMoE-Image (#13317)

    @DN6

    [CI] Refactor Cosmos Transformer Tests (#13335)

    [CI] Hunyuan Transformer Tests Refactor (#13342)

    [CI] Add PR/Issue Auto Labeler (#13380)

    [CI] Add GLM Image Transformer Model Tests (#13344)

    [CI] Use finegrained token for Issue Labeler (#13433)

    [CI] Fix BnB tests (#13481)

    @akshan-main

    Cache RoPE freqs on device to avoid repeated CPU-GPU copy in QwenImage (#13406)

    Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution i… (#13440)

    [modular] Add LTX Video modular pipeline (#13378)

    Add modular pipeline for HunyuanVideo 1.5 (#13389)

    Add Ernie-Image modular pipeline (#13498)

    @HsiaWinter

    Add ernie image (#13432)

    fix some dtype issue for gguf / some gpu backends (#13464)

    @hlky

    FlashPack (#12700)

    Improve trust_remote_code (#13448)

    @RuixiangMa

    [Feat] Adds LongCat-AudioDiT pipeline (#13390)

    [chore] Add diffusers-format example to LongCatAudioDiTPipeline (#13483)

    [Bugfix] Fix shape mismatch in LongCatAudioDiTTransformer conversion (#13494)

    @adi776borate

    Add FLUX.2 Klein Inpaint Pipeline (#13050)

    Fix missing latents_bn_std dtype cast in VAE normalization (#13299)

    @ChuxiJ

    Add ACE-Step pipeline for text-to-music generation (#13095)

    Original source
  • Apr 30, 2026
    • Date parsed from source:
      Apr 30, 2026
    • First seen by Releasebot:
      May 1, 2026
    CoreWeave logo

    CoreWeave

    April 30, 2026

    CoreWeave updates supported GPU drivers, adding new default 595 support across multiple instance types.

    The supported GPU drivers have changed. See Supported driver versions for the full compatibility table.

    B200 (InfiniBand)

    New default: 580 → 595

    B300 (InfiniBand)

    Drivers added: 595

    GB200 NVL72-powered instances

    New default: 580 → 595

    GH200

    New default: 580 → 595

    H100 (InfiniBand)

    New default: 580 → 595

    H200 (InfiniBand)

    New default: 580 → 595

    L40

    New default: 580 → 595

    L40S

    New default: 580 → 595

    Drivers added: 535, 570, 595

    Original source
  • All of your release notes in one feed

    Join Releasebot and get updates from Hugging Face and hundreds of other software products.

    Create account
  • Apr 28, 2026
    • Date parsed from source:
      Apr 28, 2026
    • First seen by Releasebot:
      Apr 29, 2026
    Hugging Face logo

    transformers by Hugging Face

    Release v5.7.0

    transformers releases v5.7.0 with new Laguna and DEIMv2 model support, plus broad attention, tokenization, generation, and kernel fixes. The update also improves continuous batching, long-sequence memory handling, and kernel loading for a smoother, more reliable experience.

    Release v5.7.0

    New Model additions

    Laguna

    Laguna is Poolside's mixture-of-experts language model family that extends standard SwiGLU MoE transformers with two key innovations. It features per-layer head counts allowing different decoder layers to have different query-head counts while sharing the same KV cache shape, and implements a sigmoid MoE router with auxiliary-loss-free load balancing that uses element-wise sigmoid of gate logits plus learned per-expert bias for router scoring.

    Links: Documentation

    Laguna XS.2 implementation (#45673) by @joerowell in #45673

    DEIMv2

    DEIMv2 (DETR with Improved Matching v2) is a real-time object detection model that extends DEIM with DINOv3 features and spans eight model sizes from X to Atto for diverse deployment scenarios. It uses a Spatial Tuning Adapter (STA) for larger variants to convert DINOv3's single-scale output into multi-scale features, while ultra-lightweight models employ pruned HGNetv2 backbones. The unified design achieves superior performance-cost trade-offs, with DEIMv2-X reaching 57.8 AP with only 50.3M parameters and DEIMv2-S being the first sub-10M model to exceed 50 AP on COCO.

    Links: Documentation | Paper

    model: Add DEIMv2 to Transformers (#44339) by @harshaljanjani in #44339

    Attention

    Several attention-related bugs were fixed across multiple models, including a cross-attention cache type error in T5Gemma2 for long inputs, incorrect cached forward behavior in Qwen3.5's gated-delta-net linear attention, and a crash in GraniteMoeHybrid when no Mamba layers are present. Attention function dispatch was also updated to align with the latest model implementations.

    Fix cross-attention cache layer type for T5Gemma2 long inputs (#45540) by @Beichen-Ma in [#45540]

    [Qwen3.5] Fix GDN linear attention multi-token cached forward (#45513) by @kashif in [#45513]

    Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models (#45514) by @tianhaocui in [#45514]

    Align latest model attention function dispatch (#45598) by @Cyrilvallez in [#45598]

    Tokenizers

    There was a bug in AutoTokenizer that caused the wrong tokenizer class to be initialized. This caused regressions in models like DeepSeek R1.

    change got reverted (#45680) by @itazap in [#45680]

    Generation

    Continuous batching generation received several fixes and improvements, including correcting KV deduplication and memory estimation for long sequences (16K+), and removing misleading warnings about num_return_sequences and other unsupported features that were incorrectly firing even when functionality worked correctly. Documentation for per-request sampling parameters was also added.

    generate: drop stale num_return_sequences warning on continuous batching path (#45582) by @joaquinhuigomez in [#45582]

    Remove unnecessary generate warnings (#45619) by @Cyrilvallez in [#45619]

    [CB] Changes for long generation (#45530) by @remi-or in [#45530]

    [docs] per-request sampling params (#45553) by @stevhliu in [#45553]

    Kernels

    Improved kernel support by fixing configuration reading and error handling for FP8 checkpoints (e.g., Qwen3.5-35B-A3B-FP8), enabling custom expert kernels registered from the HF Hub to be properly loaded, and resolving an incompatibility that prevented Gemma3n and Gemma4 from using the rotary kernel.

    Fix configuration reading and error handling for kernels (#45610) by @hmellor in [#45610]

    Allow for registered experts from kernels hub (#45577) by @winglian in [#45577]

    Gemma3n and Gemma4 cannot use rotary kernel (#45564) by @Cyrilvallez in [#45564]

    Bugfixes and improvements

    fixing more typos (#45689) by @vasqu in [#45689]

    [docs] cb memory management (#45587) by @stevhliu in [#45587]

    [docs] cpu offloading (#45660) by @stevhliu in [#45660]

    docs(README_zh-hans): clarify conditions for not using Transformers (#45688) by @GuaiZai233 in [#45688]

    fix padding side issue for fast_vlm tests (#45592) by @kaixuanliu in [#45592]

    Fix x_clip: 8 failed test cases (#45394) by @kaixuanliu in [#45394]

    zero_shot_object_detection ValueError fix for python 3.13 (#45669) by @AnkitAhlawat7742 in [#45669]

    Fix pageable H2D copies in Gated DeltaNet PyTorch fallback (#45665) by @ruixiang63 in [#45665]

    Fix UnboundLocalError in shard_and_distribute_module for replicated parameters (#45675) by @Abdennacer-Badaoui in [#45675]

    [MistralCommonBackend] Soften validation mode and apply_chat_template arguments check (#45628) by @juliendenize in [#45628]

    Fix NameError: PeftConfigLike triggered by PreTrainedModel.init_subclass (#45658) by @qgallouedec in [#45658]

    chore(typing): added modeling_utils to ty (#45425) by @tarekziade in [#45425]

    [gemma4] infer from config instead of hardcoding (#45606) by @eustlb in [#45606]

    Update quants tests (#45480) by @SunMarc in [#45480]

    🔴🔴🔴 fix: skip clean_up_tokenization for BPE tokenizers in PreTrainedTokenizerFast (#44915) by @maxsloef-goodfire in [#44915]

    Fix colmodernvbert tests (#45652) by @Cyrilvallez in [#45652]

    [CB] [Major] Add CPU request offloading (#45184) by @remi-or in [#45184]

    Fix peft constructors (#45622) by @Cyrilvallez in [#45622]

    chore: speedup modular converter (~30%) (#45046) by @tarekziade in [#45046]

    Fix whisper return language (#42227) by @FredHaa in [#42227]

    Add supports_gradient_checkpointing to NemotronHPreTrainedModel (#45625) by @sergiopaniego in [#45625]

    Raise clear error for problem_type="single_label_classification" with num_labels=1 (#45611) by @gaurav0107 in [#45611]

    CircleCI with torch 2.11 (#45633) by @ydshieh in [#45633]

    chore: bump doc-builder SHA for main doc build workflow (#45631) by @rtrompier in [#45631]

    Allow more artifacts to be download in CI (#45629) by @ydshieh in [#45629]

    chore(qa): split pipeline and add type checking (#45432) by @tarekziade in [#45432]

    Skip failing offloading tests (#45624) by @Cyrilvallez in [#45624]

    fix: compute auxiliary losses when denoising is disabled in D-FINE (#45601) by @Abineshabee in [#45601]

    qa: bumped mlinter and allow local override (#45585) by @tarekziade in [#45585]

    Processing Utils: continue when content is a string (#45605) by @RyanMullins in [#45605]

    SonicMoe (#45433) by @IlyasMoutawwakil in [#45433]

    fix transformers + torchao nvfp4 serialization (#45573) by @vkuzo in [#45573]

    [AMD CI] Fix expectations for Gemma3n (#45602) by @Abdennacer-Badaoui in [#45602]

    [docs] multi-turn tool calling (#45554) by @stevhliu in [#45554]

    Fix AttributeError on s_aux=None in flash_attention_forward (#45589) by @jamesbraza in [#45589]

    do not index past decoded chars with special tokens (#45435) by @itazap in [#45435]

    Update dev version (#45583) by @vasqu in [#45583]

    Update torchao usage for XPU and CPU (#45560) by @jiqing-feng in [#45560]

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    @vasqu

    fixing more typos (#45689)

    Update dev version (#45583)

    @joerowell

    Laguna XS.2 implementation (#45673)

    @tarekziade

    chore(typing): added modeling_utils to ty (#45425)

    chore: speedup modular converter (~30%) (#45046)

    chore(qa): split pipeline and add type checking (#45432)

    qa: bumped mlinter and allow local override (#45585)

    @harshaljanjani

    model: Add DEIMv2 to Transformers (#44339)

    @remi-or

    [CB] [Major] Add CPU request offloading (#45184)

    [CB] Changes for long generation (#45530)

    Original source
  • Apr 28, 2026
    • Date parsed from source:
      Apr 28, 2026
    • First seen by Releasebot:
      Apr 29, 2026
    Together AI logo

    Together AI

    Model Deprecation: MiniMax M2.5

    Together AI deprecates MiniMaxAI/MiniMax-M2.5 on serverless and recommends migrating to MiniMaxAI/MiniMax-M2.7.

    MiniMaxAI/MiniMax-M2.5 has been deprecated and is no longer available on serverless. We recommend migrating to MiniMaxAI/MiniMax-M2.7.

    Original source
  • Apr 27, 2026
    • Date parsed from source:
      Apr 27, 2026
    • First seen by Releasebot:
      Apr 28, 2026
    CoreWeave logo

    CoreWeave

    April 27, 2026

    CoreWeave updates Cabinet Wrangler and Cabinet Visualizer dashboards to use rack name as the primary label for filtering and identification.

    The Cabinet Wrangler and Cabinet Visualizer dashboards now display rack name as the primary label for filtering and identification, replacing NVLink domain. Both metrics remain available in the dashboards. See the Cabinet Wrangler release note for more information.

    Original source
  • Apr 24, 2026
    • Date parsed from source:
      Apr 24, 2026
    • First seen by Releasebot:
      Apr 25, 2026
    Together AI logo

    Together AI

    Serverless Model Bring Up: DeepSeek-V4-Pro

    Together AI adds DeepSeek-V4-Pro to serverless with 512K context, function calling, structured outputs, and FP4 pricing.

    deepseek-ai/DeepSeek-V4-Pro has been added to serverless.

    • Context length: 512,000
    • Pricing: $2.10 input / $4.40 output / $0.20 cached input (per 1M tokens)
    • Quantization: FP4
    • Function calling and structured outputs supported
    Original source
  • Apr 24, 2026
    • Date parsed from source:
      Apr 24, 2026
    • First seen by Releasebot:
      Apr 25, 2026
    Together AI logo

    Together AI

    Pricing Update: No-Packing Fine-Tuning Jobs

    Together AI updates no-packing fine-tuning pricing and lets users control costs with configurable max sequence length.

    We rolled out a pricing update for no-packing fine-tuning jobs. When the no-packing option is chosen, the number of training dataset tokens is now calculated as len(dataset) * max_seq_length to account for the compute used by packing-free jobs.

    • max_seq_length is configurable in both the SDK and UI.
    • Price prediction reflects these changes, so if no-packing is chosen you can control the cost of the job by adjusting the sequence length.
    Original source
  • Apr 24, 2026
    • Date parsed from source:
      Apr 24, 2026
    • First seen by Releasebot:
      Apr 16, 2026
    • Modified by Releasebot:
      Apr 25, 2026
    Together AI logo

    Together AI

    Serverless Model Bring Ups

    Together AI adds new models including Cogito, Veo, Vidu, and Wan to expand its AI lineup.

    The following models have been added:

    • deepcogito/cogito-v2-1-671b
    • google/veo-3.1-test-debug
    • vidu/vidu-q3
    • vidu/vidu-q3-turbo
    • Wan-AI/wan2.7-i2v
    • Wan-AI/wan2.7-r2v
    Original source
  • Apr 23, 2026
    • Date parsed from source:
      Apr 23, 2026
    • First seen by Releasebot:
      Apr 24, 2026
    Hugging Face logo

    transformers by Hugging Face

    Patch release v5.6.2

    transformers fixes FP8 support for Qwen 3.5 and 3.6 MoE text models and improves kernel config reading and error handling.

    Patch release v5.6.2

    Qwen 3.5 and 3.6 MoE (text-only) were broken when using with FP8. It should now work again with this 🫡

    Fix configuration reading and error handling for kernels (#45610) by @hmellor

    Full Changelog: v5.6.1...v5.6.2

    Original source
  • Apr 23, 2026
    • Date parsed from source:
      Apr 23, 2026
    • First seen by Releasebot:
      Apr 23, 2026
    Hugging Face logo

    transformers by Hugging Face

    Patch release v5.6.1

    transformers fixes broken flash attention path and s_aux=None AttributeError in patch release v5.6.1.

    Patch release v5.6.1

    Flash attention path was broken! Sorry everyone for this one 🤗

    Fix AttributeError on s_aux=None in flash_attention_forward (#45589) by @jamesbraza

    Original source