MiniMax Release Notes

Last updated: Dec 23, 2025

  • Dec 23, 2025
    • Parsed from source:
      Dec 23, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MinimMax by MiniMax

    MiniMax M2.1: Significantly Enhanced Multi-Language Programming, Built for Real-World Complex Tasks

    MiniMax M2.1 unleashes AI-native development with stronger multi-language coding, improved office task automation, and enhanced mobile Web/App capabilities. It promises faster, cheaper, more capable AI workflows and opens the model to open-source deployment and public tools.

    MiniMax M2.1 Release

    MiniMax has been continuously transforming itself in a more AI-native way. The core driving forces of this process are models, Agent scaffolding, and organization. Throughout the exploration process, we have gained increasingly deeper understanding of these three aspects. Today we are releasing updates to the model component, namely MiniMax M2.1, hoping to help more enterprises and individuals find more AI-native ways of working (and living) sooner.

    In M2, we primarily addressed issues of model cost and model accessibility. In M2.1, we are committed to improving performance in real-world complex tasks: focusing particularly on usability across more programming languages and office scenarios, and achieving the best level in this domain.

    Key Highlights of MiniMax M2.1:

    • Exceptional Multi-Programming Language Capabilities
      Many models in the past primarily focused on Python optimization, but real-world systems are often the result of multi-language collaboration.
      In M2.1, we have systematically enhanced capabilities in Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, and other languages. The overall performance on multi-language tasks has reached industry-leading levels, covering the complete chain from low-level system development to application layer development.
    • WebDev and AppDev: A Comprehensive Leap in Capability and Aesthetics
      Addressing the widely recognized weakness in mobile development across the industry, M2.1 significantly strengthens native Android and iOS development capabilities.
      Meanwhile, we have systematically enhanced the model's design comprehension and aesthetic expression in Web and App scenarios, enabling excellent construction of complex interactions, 3D scientific scene simulations, and high-quality visualization, making vibe coding a sustainable and deliverable production practice.
    • Enhanced Composite Instruction Constraints, Enabling Office Scenarios
      As one of the first open-source model series to systematically introduce Interleaved Thinking, M2.1's systematic problem-solving capabilities have been further upgraded. The model not only focuses on code execution correctness but also emphasizes integrated execution of "composite instruction constraints," providing higher usability in real office scenarios.
    • More Concise and Efficient Responses
      Compared to M2, MiniMax-M2.1 delivers more concise model responses and thought chains. In practical programming and interaction experiences, response speed has significantly improved and token consumption has notably decreased, resulting in smoother and more efficient performance in AI Coding and Agent-driven continuous workflows.
    • Outstanding Agent/Tool Scaffolding Generalization Capabilities
      M2.1 demonstrates excellent performance across various programming tools and Agent frameworks. It exhibits consistent and stable results in tools such as Claude Code, Droid (Factory AI), Cline, Kilo Code, Roo Code, and BlackBox, while providing reliable support for Context Management mechanisms including Skill.md, Claude.md/agent.md/cursorrule, and Slash Commands.
    • High-Quality Dialogue and Writing
      M2.1 is no longer just "stronger in coding capabilities." In everyday conversation, technical documentation, and writing scenarios, it also provides more detailed and structured responses.

    First Impressions

    "We're excited for powerful open-source models like M2.1 that bring frontier performance (and in some cases exceed the frontier) for a wide variety of software development tasks. Developers deserve choice, and M2.1 provides that much needed choice!"

    • Eno Reyes, Co-Founder, CTO of Factory AI

    “MiniMax M2.1 performed exceptionally well across our internal benchmarks, showing strong results in complex instruction following, reranking, and classification, especially within e-commerce tasks. Beyond its general versatility, it has proven to be an excellent model for coding. We are impressed by these results and look forward to a close collaboration with the MiniMax team as we continue to support their latest innovations on the Fireworks platform.”

    • Benny Chen, Co-founder of Fireworks

    “Minimax M2 series has demonstrated powerful code generation capability, and has quickly became one of the most popular model on Cline platform during the past few months. We already see another huge advancement in capability for M2.1 and very excited to continue partner with minimax team to advance AI in coding”

    • Saoud Rizwan, Founder, CEO of Cline

    “We could not be more excited about M2.1! Our users have come to rely on MiniMax for frontier-grade coding assistance at a fraction of the cost, and early testing shows M2.1 excelling at everything from architecture and orchestration to code reviews and deployment. The speed and efficiency are off the charts!”

    • Scott Breitenother, Co-Founder, CEO of Kilo

    "Our users love MiniMax M2 for its strong coding ability and efficiency. The latest M2.1 release builds on that foundation with meaningful improvements in speed and reliability, performing well across a wider range of languages and frameworks. It's a great choice for high-throughput, agentic coding workflows where speed and affordability matter."

    • Matt Rubens, Co-Founder, CEO of RooCode

    “Integrating the MiniMax M2 series into our platform has been a significant win for our users, and M2.1 represents a clear step forward in what a coding-specific model can achieve. We’ve found that M2.1 handles the nuances of complex, multi-step programming tasks with a level of consistency that is rare in this space. By providing high-quality reasoning and context awareness at scale, MiniMax has become a core component of how we help developers solve challenging problems faster. We look forward to seeing how our community continues to leverage these updated capabilities.”

    • Robert Rizk, Co-Founder, CEO of BlackBox

    Benchmarks

    MiniMax-M2.1 delivers a significant leap over M2 on core software engineering leaderboards. It shines particularly bright in multilingual scenarios, where it outperforms Claude Sonnet 4.5 and closely approaches Claude Opus 4.5.

    We also evaluated MiniMax-M2.1 on SWE-bench Verified across a variety of coding agent frameworks. The results highlight the model's exceptional framework generalization and robust stability.
    Furthermore, across specific benchmarks—including test case generation, code performance optimization, code review, and instruction following—MiniMax-M2.1 demonstrates comprehensive improvements over M2. In these specialized domains, it consistently matches or exceeds the performance of Claude Sonnet 4.5.

    To evaluate the model's full-stack capability to architect complete, functional applications "from zero to one," we established a novel benchmark: VIBE (Visual & Interactive Benchmark for Execution). This suite encompasses five core subsets: Web, Simulation, Android, iOS, and Backend. Distinguishing itself from traditional benchmarks, VIBE leverages an innovative Agent-as-a-Verifier (AaaV) paradigm to automatically assess the interactive logic and visual aesthetics of generated applications within a real runtime environment.
    MiniMax-M2.1 delivers outstanding performance on the VIBE aggregate benchmark, achieving an average score of 88.6—demonstrating robust full-stack development capabilities. It excels particularly in the VIBE-Web (91.5) and VIBE-Android (89.7) subsets.
    MiniMax-M2.1 also demonstrates steady improvements over M2 in both long-horizon tool use and comprehensive intelligence metrics.

    Showcases

    • Multilingual Coding

    • 3D Interactive Animation
      MiniMax M2.1 built a "3D Dreamy Christmas Tree" based on React Three Fiber and InstancedMesh, successfully rendering over 7,000 instances. It supports gesture interaction and complex particle animation, demonstrating advanced 3D rendering capabilities.
      Try it out: https://yuyl27wq92.space.minimax.io/

    • Avant-Garde Web UI Design
      M2.1 generated a minimalist photographer's personal homepage using an asymmetrical layout and a black-white-red contrasting color scheme. By combining immersive imagery with brutalist typography, it achieved a high-impact visual effect.
      Try it out: https://m6xkaf07udss.space.minimax.io/

    • Website - Skincare Brand
      M2.1 designed a landing page for a high-end organic skincare brand. Adopting a "Clean & Minimalist" style, it accurately presented the brand's premium identity and international visual appeal.
      Try it out: https://2drpfocv00n9.space.minimax.io/

    • Web 3D Lego Sandbox
      M2.1 developed a high-freedom 3D brick building application based on Three.js, implementing precise grid snapping algorithms and collision detection mechanisms. The project perfectly replicates the glossy texture of plastic bricks, supporting multi-angle rotation, drag-and-drop assembly, and instant color switching, providing users with an immersive 3D creative building experience.
      Try it out: https://8e6nunemyuzh.space.minimax.io/

    • Native App Development - Android
      M2.1 used Kotlin to develop a native Android gravity sensor simulator. Utilizing the gyroscope for a silky-smooth control experience, it features clever visual easter eggs that elegantly present the "MERRY XMAS MiniMax M2.1" message through natural UI transitions and collision effects.

    • Native App Development - iOS
      M2.1 wrote an interactive iOS Home Screen widget, designing a "Sleeping Santa" click-to-wake mechanism. The logic is complete with native-level animation effects—Santa lives in your widget; tap him ten times to wake him up for a surprise! 🎅🎁

    • Web Audio Simulation Development
      M2.1 developed a 16-step drum machine simulator based on the Web Audio API. It integrates synthesized drum sounds, non-linear rhythm algorithms, and real-time glitch sound effects, providing an avant-garde electronic music experience! (Turn on the sound in the video below to listen!)
      Try it out: https://21okxwno2u.space.minimax.io

    • Rust TUI
      M2.1 built a powerful Linux security audit tool with dual CLI + TUI modes using Rust, supporting one-click low-level scanning and intelligent risk rating for critical items such as processes, networks, and SSH.

    • Python Data Dashboard
      M2.1 created a Web3 cryptocurrency price dashboard in the style of The Matrix. Use Python (backend for real-time price API fetching), HTML (structure), and CSS (Matrix aesthetic: green digital rain on black background, monospaced font, glowing neon green text, terminal-like UI).

    • C++ Image Rendering
      M2.1 utilized C++ and GLSL to implement complex light transport algorithms, accurately rendering the physical refraction of a crystal ball, detailed SDF modeling of a snowman, and shimmering snow effects in a real-time environment.

    • Java Real-time Danmaku
      M2.1 implemented a high-performance real-time Danmaku (bullet chat) system based on Java, a clean and intuitive user interface, and millisecond-level response capabilities.

    • SVG Generation
      M2.1 generated an interactive isometric SVG island map, constructing a detailed miniature world that supports one-click zooming to freely explore four major themed areas.
      Try it out: https://08tmc3aada59.space.minimax.io/

    • Agentic Tool Use
      Tool Use Capability: Excel Market Research
      M2.1 demonstrated its tool-use capabilities by autonomously invoking Excel and Yahoo Finance to complete an end-to-end task, ranging from market research data cleaning and analysis to chart generation.

    • Digital Employee
      The "Digital Employee" is a key feature of the MiniMax M2.1 model. M2.1 accepts web content presented in text form and controls mouse clicks and keyboard inputs via text-based commands. It can complete end-to-end tasks in daily office scenarios across administration, data science, finance, human resources, and software development. The following demo video is a screen recording of M2.1's behavioral trajectory in the Agent Company Benchmark.

    • End-to-End Office Automation
      Demo 1: Administrative tasks
      Task Requirements: Proactively collect employees' equipment requests on communication software, then search for relevant documents on the enterprise's internal server to obtain equipment prices, calculate the total cost and determine whether the department budget is sufficient, and then record equipment changes.

      Demo 2: Project management tasks
      Task Requirements: Search for blocked or backlogged issues on the project management software, then find relevant employees on the communication software and consult them for solutions, and update the status of the issues based on the employees' feedback.

      Demo 3: Software development tasks
      Task Requirements: A colleague wants to know which is the most recent Merge Request that modified a certain file. Search for the relevant Merge Request, find its number, and inform the colleague.

    How to Use

    Local Deployment Guide

    Download the model from HuggingFace repository
    We recommend using the following inference frameworks (listed alphabetically) to serve the model:

    • SGLang
      We recommend using SGLang to serve MiniMax-M2.1. Please refer to our SGLang Deployment Guide.

    • vLLM
      We recommend using vLLM to serve MiniMax-M2.1. Please refer to our vLLM Deployment Guide.

    • Other Inference Engines

      • MLX
      • KTransformers

    Inference Parameters

    We recommend using the following parameters for best performance:
    temperature=1.0, top_p = 0.95, top_k = 40

    Tool Calling Guide

    Please refer to our Tool Calling Guide.

    Contact Us

    Original source Report a problem
  • December 2025
    • No date parsed from source.
    • Detected by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MinimMax by MiniMax

    MiniMax-M2.1: Polyglot programming mastery, precision code refactoring

    MiniMax-M2.1 launches polyglot video generation from text or images with new models boosting realism and speed. The release outlines an asynchronous API flow to create, track, and download videos via task and file IDs.

    🎉 MiniMax-M2.1: Polyglot programming mastery, precision code refactoring ➔

    This API supports generating videos based on user-provided text, images (including first frame, last frame, or reference images).

    Supported Models

    • MiniMax-Hailuo-2.3: New video generation model, breakthroughs in body movement, facial expressions, physical realism, and prompt adherence.
    • MiniMax-Hailuo-2.3-Fast: New Image-to-video model, for value and efficiency.
    • MiniMax-Hailuo-02: Video generation model supporting higher resolution (1080P), longer duration (10s), and stronger adherence to prompts.

    API Usage Guide

    Video generation is asynchronous and consists of three APIs: Create Video Generation Task, Query Video Generation Task Status, and File Management. Steps are as follows:

    • Use the Create Video Generation Task API: (Text to Video, Image to Video, Start / End to Video, Subject Reference to Video) to start a task. On success, it will return a task_id.
    • Use the Query Video Generation Task Status API with the task_id to check progress. When the status is success, a file ID (file_id) will be returned.
    • Use the Download the Video File API with the file_id from step 2 to view and download the generated video.

    Official MCP

    Visit the official MCP for more capabilities: https://github.com/MiniMax-AI/MiniMax-MCP

    Original source Report a problem
  • December 2025
    • No date parsed from source.
    • Detected by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MinimMax by MiniMax

    MiniMax-M2.1: Polyglot programming mastery, precision code refactoring

    New Image Generation service introduces Text-to-Image and Image-to-Image capabilities. Generate images from detailed prompts or from reference images to preserve subject characteristics and maintain visual identity across contexts.

    The Image Generation service provides two core capabilities: Text-to-Image and Image-to-Image.

    Generate Images from Text

    Create images directly from detailed text descriptions (prompts) that specify the desired content.

    Generate Images with Reference Images

    This feature allows you to supply one or more reference images (including online image URLs) that contain a clear subject. Combined with a text prompt, the service generates a new image that preserves the subject’s key characteristics.
    This is particularly useful for scenarios that require consistent visual identity, such as generating images of the same virtual character in different contexts.

    Original source Report a problem
  • December 2025
    • No date parsed from source.
    • Detected by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MinimMax by MiniMax

    MiniMax-M2.1: Polyglot programming mastery, precision code refactoring

    Music Generation API now lets you generate full songs with vocals from text prompts and lyrics. Define style, mood, tempo, and vocal traits to craft ready-to-use tracks for videos, games, or apps. Aimed at quick, theme-driven music creation.

    The Music Generation API

    The Music Generation API can create a complete song with vocals based on a text description and lyrics.

    Use the prompt parameter to define the music’s style, mood, and scenario, and the lyrics parameter to provide the vocal content.
    This feature is ideal for quickly generating unique theme songs for videos, games, or applications.

    Example: Text-to-Music Creation

    import requests
    import os
    
    url = "https://api.minimax.io/v1/music_generation"
    api_key = os.environ["MINIMAX_API_KEY"]
    headers = {
      "Authorization": f"Bearer {api_key}"
    }
    
    payload = {
      "model": "music-2.0",
      "prompt": "This is a contemporary R&B/Pop track with distinct Trap influences, radiating a confident, assertive, and empowered energy. It features a bright, clear, and agile female vocal with a polished and heavily processed modern sound. The singer's rhythmic and confident delivery is defined by the heavy and stylistic use of Auto-Tune, creating its signature character. Extensive backing vocals, including layered harmonies and ad-libs built upon stacked unison vocals, produce a rich and full texture, enhanced by moderate reverb for a spacious feel. Set at a tempo of 80 BPM, the arrangement is driven by a dominant 808 bassline and electronic drums with intricate hi-hat patterns and sharp claps, while atmospheric synth pads and subtle sound effects craft a dynamic backdrop. This track is perfect for clubbing, parties, driving with the windows down, or a workout session, making it an essential addition to any confidence-boosting playlist.",
      "lyrics": "[chorus]\nSummit, i reached the summit\nI'm the peak with the fire, they all want from it\nSpill a bit of my glow, like a comet\nI ain't worried 'bout hills, you just plummet\nSummit, i reached the summit\nObsidian shards 'round my throat, now they run from it\nAin't no wonder why the valleys all run from it\nI'm awake, from the summit\n[verse]\nI know what i hold\nAnd i'm about to erupt, yeah\nA story untold, yeah\nI know you won't interrupt it\nKeep your eyes on the rise, no surprise that i'm bright\nGot one stream for the sea, other stream for the night\nI be flowin', you're erodin'\nSwear you're slowin', i'm explodin'\nPressure's growin', growin', growin'\n[interlude]\nSummit, i reached the summit\nI'm the peak with the fire, they all want from it\nSpill a bit of my glow, like a comet\nI ain't worried 'bout stone\n[verse]\nI ain't worried 'bout nada\nUnless it's new earth, unless it's magma\nUnless it's deep core, a new nirvana\nUnless it's shaping a new savanna\nI wanna feel like i'm mother gaia\nI wanna feel like i'm way up\nRumbling, grumbling 'til the world pay up\nMade another island, no layups\nStay hot every single day i wake up\n[chorus]\nSummit, i reached the summit\nI'm the peak with the fire, they all want from it\nSpill a bit of my glow, like a comet\nI ain't worried 'bout hills, you just plummet\nSummit, i reached the summit\nObsidian shards 'round my throat, now they run from it\nAin't no wonder why the valleys all run from it\nI'm awake, from the summit\n[outro]\nSummit\nRooo-ar",
      "audio_setting": {
        "sample_rate": 44100,
        "bitrate": 256000,
        "format": "mp3"
      }
    }
    
    response = requests.post(url, headers = headers, json = payload)
    response.raise_for_status()
    audio_hex = response.json()["data"]["audio"]
    
    with open("output.mp3", "wb") as f:
        f.write(bytes.fromhex(audio_hex))
    
    Original source Report a problem
  • December 2025
    • No date parsed from source.
    • Detected by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MinimMax by MiniMax

    MiniMax-M2.1

    MiniMax-M2.1 launches a polyglot text generation API with buildable tool calls, accessible via HTTP or SDKs. It supports ultra large context windows up to 204,800 tokens and emphasizes code understanding and interleaved tool use.

    🎉 MiniMax-M2.1: Polyglot programming mastery, precision code refactoring

    The text generation API uses MiniMax M2.1 to generate conversational content and trigger tool calls based on the provided context.

    It can be accessed via HTTP requests, the Anthropic SDK (Recommended), or the OpenAI SDK.

    Supported Models

    Model Name Context Window (total input + output per request) MiniMax-M2.1 MiniMax-M2.1-lightning 204,800 MiniMax-M2 204,800

    Please note: The maximum token count refers to the total number of input and output tokens.

    Recommended Reading

    • Compatible Anthropic API (Recommended): Use Anthropic SDK with MiniMax models
    • Compatible OpenAI API: Use OpenAI SDK with MiniMax models
    • M2.1 for AI Coding Tools: MiniMax-M2.1 excels at code understanding, dialogue, and reasoning.
    • M2.1 Tool Use & Interleaved Thinking: AI models can call external functions to extend their capabilities.
    Original source Report a problem
  • Oct 31, 2025
    • Parsed from source:
      Oct 31, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MinimMax by MiniMax

    MiniMax Music 2.0

    MiniMax Music 2.0 launches with dynamic vocals, precise instrument control, and professional-grade audio. Create complete songs up to five minutes, perform versatile singing styles and duets, and turn prompts into polished, film-grade soundscapes.

    Today, we are officially launching our latest-generation music model—MiniMax Music 2.0. This version represents a true leap forward in the model's understanding and expression of music. It can accurately capture and reproduce everything from the subtle emotions of the human voice to the dynamic tension of musical instruments.

    It understands rhythm and emotion, weaving together vocals and instruments to become the ultimate "singing producer."

    From now on, expressing yourself through music is no longer a privilege for the few, but a joy accessible to everyone.

    Turn your inspiration into flowing melodies. Feel the rhythm, let the music belong to you.

    1. Dynamic Vocals with Mastery Over Diverse Singing Styles

    You don't need professional vocal training to sing the melody in your heart with the voice, technique, and style you desire.

    In terms of vocal texture, Music 2.0 produces a timbre that is incredibly close to the real human voice. The model performs like a seasoned "vocal powerhouse," capable of mastering a wide range of singing techniques and emotional styles. Its nuanced handling of phrasing, rhythm, and breath demonstrates a "musical intuition" comparable to a professional singer.

    The model supports precise control over vocal timbre. Using prompts, you can maintain a core vocal identity while switching between different singing styles, allowing one voice to have a thousand variations. The AI can transform into a "versatile vocal artist."

    The same female voice can switch effortlessly between Jump Blues, Rock, and Electronic styles.

    Beyond popular genres like Pop, Jazz, Blues, Rock, and Folk, the model also supports male-female duets, a cappella, and more.

    Achieve a dynamic duet with a conversational feel and varied intensity through seamless transitions between male and female lead vocals.

    Create rich melodies even without instrumental accompaniment.

    2. Catchy Melodies and Precise Instrument Control

    You don't have to be a music arranger to compose your own complete musical piece.

    Building on the strengths of its predecessor, Music 2.0 generates structurally complete songs with clear logic, including verses, choruses, and bridges, with a potential length of up to five minutes. Furthermore, the new model creates melodies that are more memorable and instantly captivating.

    The hook's melody is easy to remember, mirroring the melodic habits of human composers.

    The model can follow specific instructions to independently control and adjust various instruments in the accompaniment, creating layered, rich arrangements with a natural groove across different styles.

    Experience a live masterclass in jazz as the saxophone, trombone, trumpet, jazz drums, and piano enter in perfect sequence.

    3. Professional-Grade Audio Experience

    The new model delivers a comprehensive upgrade in audio quality. Both the texture of the vocal tracks and the spatial presence of the instruments are enhanced, providing you with an immersive listening experience.

    Step into a retro disco with vibrant vocal performances and classic 80s instrumentation that will transport you back to the golden age of dance.

    One More Thing

    While testing Music 2.0, we made a surprising discovery: you can use prompts to describe vocal emotions and soundscapes with precision to generate film-grade monologue soundtracks. The layered emotional progression and musical development create a vivid picture you can "hear."

    This exciting capability stems from the model's accurate semantic understanding combined with its precise control over vocal expression—a perfect fusion that gives sound a versatile emotional contour.

    Music 2.0 is now live.
    Start creating and discover your own sound:
    https://www.minimax.io/audio/music

    Intelligence with Everyone.

    Original source Report a problem
  • Oct 30, 2025
    • Parsed from source:
      Oct 30, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MinimMax by MiniMax

    MiniMax Speech 2.6: The Ultimate Voice Agent Has Arrived

    MiniMax Speech 2.6 launches with under 250 ms end-to-end latency, smarter handling of non standard formats, and Fluent LoRA for natural, multi language voices. It enables faster, more fluid voice interactions across real world platforms and devices.

    MiniMax Speech 2.6 Release Notes

    Today, we’re thrilled to introduce MiniMax Speech 2.6 — our latest speech model, bringing comprehensive upgrades with ultra-low latency, enhanced format handling, and a more natural, human-like voice for Voice Agent scenarios.

    Since its launch, MiniMax Speech has become a core piece of infrastructure in the global voice intelligence landscape, known for its outstanding speech technology and exceptional cost-effectiveness.

    From LiveKit, which powers ChatGPT's advanced voice mode, and the popular open-source framework Pipecat on GitHub, to the YC-incubated voice platform Vapi, all have chosen MiniMax Speech as their underlying technology engine. In the smart hardware sector, innovative products like Haivivi Bubble Pal, Fuzozo, and Rokid Glasses are also powered by MiniMax Speech to deliver their natural voice interaction experiences.

    MiniMax continues to drive new forms of productivity through technological innovation, breaking down the barriers of language and culture to deliver natural, fluent interactions that connect every voice around the world.

    Ultra-Low Latency, More Responsive: For Smoother Overall Interaction

    We have completely optimized the audio generation pipeline, achieving an end-to-end latency of under 250 milliseconds—a top-tier industry standard. In scenarios with strict response time requirements, such as real-time conversations, audio generation is no longer the bottleneck, ensuring a smoother overall interaction.

    Listen to Speech 2.6 acting as an AI customer service agent:

    Seamless Handling of Specialized Formats, Smarter: For More Fluid Information Delivery

    Speech 2.6 now directly converts non-standard text formats in multiple languages, including URLs, email addresses, phone numbers, dates, and monetary amounts. Whether you are using it with a large language model or need to process dynamically changing entity information in your business, you no longer need to perform tedious text pre-processing. The input is read correctly from the start, enabling more fluid information delivery.

    For example, to correctly read the following passage, traditional TTS would require a series of conversions:

    • +1 415 415 9921 → “plus one, four one five, four one five, nine nine two one”
    • $1,234.56 → “one thousand two hundred thirty-four dollars and fifty-six cents”
    • 192.168.1.1 → “one nine two dot one six eight dot one dot one”
    • 2032-5-6 → “May sixth, twenty thirty-two”
    • [email protected] → “support dash vip at technet dot com”

    Original Text:
    "Hello Oliver Smith, I'm your intelligent virtual assistant Max! Thank you for your call. I've found your file. The outstanding balance for the phone number +1 415 415 9921 is $1,234.56. The associated IP address is 192.168.1.1. Your next payment is due on 2032-5-6. If you have any questions, please contact [email protected]."

    Greater Naturalness and Fluent LoRA: For More Fluent Vocal Expression

    In addition to further enhancing prosodic naturalness, Speech 2.6 also introduces Fluent LoRA.

    Speech 2.5 already offered a convenient, high-fidelity voice cloning feature that allowed users to preserve the unique characteristics of the original voice, such as accents and speech habits. This capability met the diverse voice needs of real-world application scenarios.

    Now, you no longer have to worry about imperfect source material when cloning a voice. Even with non-native recordings that may have an accent or be disfluent, Fluent LoRA can perfectly replicate the voice's timbre while generating fluent, natural speech that matches the target text, making your vocal expression more articulate.

    Besides the English example shown in the video, this feature enables one-click fluency for voice cloning across the 40+ languages the model supports. Here is an example in a Japanese scenario:

    Speech 2.6 is now fully live. Welcome to try it out:

    MiniMax Open Platform:
    https://www.minimax.io/platform_overview

    MiniMax Audio:
    https://www.minimax.io/audio

    Intelligence with Everyone.

    Original source Report a problem
  • Oct 28, 2025
    • Parsed from source:
      Oct 28, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MinimMax by MiniMax

    MiniMax Hailuo 2.3: A New Level of Complex Video Performance & Media Agent

    Hailuo 2.3 launches the MiniMax video model with improved physics, lifelike expressions, and richer stylization, plus a faster cheaper Fast variant. The Media Agent now enables one-click multi-modal video creation with optional step-by-step editing and global rollout.

    Hailuo 2.3 MiniMax Video Model Release

    Today, we are excited to introduce the MiniMax video model, Hailuo 2.3. Building upon the Hailuo 02 model, it further enhances dynamic expression, resulting in more realistic and stable visuals. The Hailuo 2.3 model achieves significant improvements in the portrayal of physical actions, stylization, and character micro-expressions, while further optimizing its response to motion commands.

    First, thanks to the model's enhanced understanding of physics and command following, Hailuo 2.3 can render more complex character body movements with greater fluidity, naturalness, precision, and control. Even with dynamic camera movements, it achieves near-photorealistic visual effects in lighting direction, shadow transitions, and color tones.

    In terms of stylization, Hailuo 2.3 offers better support for anime, illustration, as well as special art styles like ink wash painting and game CG. Users who love anime creation adored the "Live" model in Hailuo 01, and now Hailuo 2.3 unlocks an even wider range of art styles, delivering more stable and vivid outputs from the general model.

    In Hailuo 2.3, live-action facial performances and micro-expression changes are also more natural. We use subtle expression changes to craft the most captivating character performances.

    In addition to improvements in human expressions and actions, Hailuo 2.3 also shows an enhanced response to motion commands for objects. With the "Double 11" shopping festival underway, some creators in our beta test produced e-commerce ads and saw a significant increase in their success rate for generating high-quality content.

    Hailuo 2.3 once again sets a new global record for video model cost-efficiency. It boosts performance while maintaining the same pricing as Hailuo 02, offering "more for the same price" to both business and consumer users and providing the best value in the industry for creators worldwide. Furthermore, we are offering the Hailuo 2.3 Fast model, which generates videos faster at a lower price, reducing costs for batch creation by up to 50%.

    We have fully rolled out these model updates across the Hailuo AI website, mobile app, and Open Platform API. We are also offering daily free trial credits during the launch period for you to experience. As we continue to iterate on the model's overall capabilities, we will also focus on deep optimization for different AI video application scenarios to solve the real-world problems our users face.

    Media Agent Evolution

    This summer, we released the Hailuo Video Agent to a positive reception. Through the usage and feedback from Hailuo creators, we've realized that multi-modal fusion creation is undoubtedly the future. Today, the Hailuo Video Agent officially evolves into the Media Agent, supporting comprehensive multi-modal creation, and it has been launched simultaneously worldwide.

    Simply input the content you want, and the Media Agent automatically matches the right multi-modal models. With no manual editing required, the "one-click video generation" feature handles everything for you. Professional creators can also use the Media Agent for step-by-step creation, freely uploading images, videos, or audio to customize the final product according to their needs.

    For example, we tried designing a 30-second ad for the "Casa Nacho" brand of tortilla chips. We simply input the desired scene, color tone, camera style, and music, and here's the result from the one-click generation feature.

    In future updates to the Media Agent, we will be able to adjust the details of any part of the creation pipeline with the Agent on a canvas, truly achieving "creation through conversation" while preserving every idea. We believe that interacting and co-creating with AI using natural language is what the next-generation creative platform should be.

    We are entering an era of rapid change, one where AI video is transforming how many people work and create. We hope that Hailuo can be an all-powerful creative assistant and a pioneer of innovation and change, allowing inspiration to take shape—and then transcend all forms.

    Experience Hailuo AI at:
    https://hailuoai.video/
    Experience Media Agent at:
    https://hailuoai.video/agent

    Original source Report a problem
  • Aug 7, 2025
    • Parsed from source:
      Aug 7, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MinimMax by MiniMax

    MiniMax Speech 2.5 Launches: Enhanced Multilingual Expressiveness Exceptional Voice Cloning Fidelity

    MiniMax launches Speech 2.5 a leap in multilingual TTS with 40+ languages, ultra realistic voice cloning, and cross lingual consistency. Businesses and creators get global, cost efficient voiceovers with platform access and worldwide rollout.

    Speech 2.5 Launch

    We launches Speech 2.5 today, once again redefining the limits of state-of-the-art voice generation.

    Building on the success of Speech 02, which we released in May, Speech 2.5 delivers three major breakthroughs:

    • Significantly enhanced multilingual performance
    • More realistic and accurate voice cloning
    • Expanded language support to over 40 languages

    A Leap in Multilingual Expressiveness: World-Class Chinese, and Major Upgrades for English and More

    Speech 2.5 achieves a significant leap in multilingual capabilities. Its performance in Chinese now sets a new global standard in terms of low error rate, voice similarity, and natural rhythm.

    At the same time, performance in English and other languages has been comprehensively upgraded, effectively eliminating the "robotic" feel common in other text-to-speech systems. Whether it’s for daily conversation or professional broadcasting, the output is smooth and natural.

    For example, listen to the solemn vow of Hamlet or the passionate commentary of a Spanish sports announcer:

    More Lifelike Voice Cloning: Replicating Accent, Style, and Emotion with Incredible Detail

    Achieving state-of-the-art precision, Speech 2.5 brings voice cloning to a new level of realism. It can flawlessly replicate a person's unique accent, speaking style, and emotional tone.

    This capability extends across languages, preserves regional accents within the same language, and even captures the subtle vocal characteristics of different age groups, ensuring the output sounds truly authentic.

    What would it sound like if the Queen of England were to introduce the new Speech 2.5? From its pauses and rhythm to its distinct pronunciation, the model perfectly preserves the pure "Queen's English" accent.

    Cross-lingual cloning is no longer a challenge. Even when switching between languages like Italian and English, the model maintains the original speaker's unique vocal characteristics and accent.

    Expanded to 40+ Languages: A Diverse, High-Quality Voice Library for Global Communication

    Speech 2.5 now supports over 40 languages, featuring a diverse, high-quality voice library to help you reach a global audience.

    We've added support for languages like Bulgarian, Danish, Greek, Swedish, Filipino, Hungarian, Spanish, Finnish, Norwegian, Slovak, Swahili, Catalan, Lithuanian, and Afrikaans. This makes Speech 2.5 a powerful tool for global applications, including cross-border e-commerce, international customer service, and localized marketing, making global content creation easier than ever.

    Who is Speech 2.5 For? Unlocking a World of Applications

    • For Businesses:
      Dramatically cut costs for multilingual customer service and international ad campaigns. Generate high-quality voiceovers for product promotions in over 40 languages in just 10 minutes, saving potentially millions on professional dubbing fees.

    • For Creators:
      Clone your own voice with stunning realism and speak fluently in over 40 languages. Effortlessly create viral short-form videos for a global audience and express yourself without borders.

    • For Educators:
      Slash course material creation time for niche languages from weeks to mere minutes. Create custom teaching materials with authentic regional accents, making global knowledge more accessible and relatable for students everywhere.

    Speech 2.5 builds upon the #1-ranked performance of our previous model, Speech 02, pushing the boundaries of quality even further while maintaining its position as the most cost-effective solution on the market.

    Today, MiniMax Speech is already trusted by leading companies worldwide. Globally, it powers services from Agent platforms like Vapi and Pipecat, and is integrated into top AI applications such as Hedra, Icon, and Syllaby. In China, industry leaders including Gaotu Education, Ximalaya, NetEase, and Rokid Glasses all rely on MiniMax Speech.

    Speech 2.5 is now live worldwide. Experience it for yourself on the MiniMax Open Platform or the official MiniMax Audio website.

    MiniMax Open Platform

    minimax.io/platform_overview

    MiniMax Audio

    minimax.io/audio

    Create your own personalized voice and unlock the limitless possibilities of audio production!

    Intelligence with Everyone.

    Original source Report a problem
  • Jun 18, 2025
    • Parsed from source:
      Jun 18, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MinimMax by MiniMax

    MiniMax Hailuo 02, World-Class Quality, Record-Breaking Cost Efficiency

    Introducing MiniMax Hailuo 02, a new AI video generation model delivering native 1080p, boosted efficiency (2.5x), larger parameter count, and expanded training data for sharper instructions and extreme physics. It powers affordable, high-fidelity video creation across platforms.

    Introduction

    Today, we are thrilled to introduce MiniMax Hailuo 02, our highly anticipated new video generation model.

    The video showcased above was a collaborative effort by three artists over the course of 1.5 days. They utilized MiniMax Hailuo 02 to generate multiple 6-10 second video clips, which were then skillfully edited into a final video.

    Key Highlights of Hailuo 02

    • Native 1080p
    • SOTA Instruction Following
    • Extreme Physics Mastery

    Indeed, artists have discovered that for highly intricate scenarios, such as gymnastics, MiniMax Hailuo 02 is currently the only model globally capable of delivering such performance. We eagerly invite the community to explore and unlock even more creative possibilities.

    Our Journey and Vision

    Our journey began late last August when we informally launched a demo webpage showcasing an early version of our video generation model. To our surprise, it attracted significant attention and acclaim from talented creators worldwide. This pivotal moment led to the development of Hailuo Video 01, our AI native video generation product, which has since empowered creators to generate over 370 million videos globally.

    Returning to our foundational principle of "Intelligence with Everyone," our ambition is to equip global creators to fully unleash their imagination, elevate the quality of their video content, and lower the barriers to video creation. Cruci ally, we strive to achieve this without imposing prohibitive costs that would limit the widespread accessibility of this technology.

    Architecture and NCR

    To this end, our team embarked on a quest to develop a more efficient video generation model architecture. This pursuit culminated in the core framework of MiniMax Hailuo 02, which we've named Noise-aware Compute Redistribution (NCR). In essence, the new architecture's central idea is as follows:

    At a comparable parameter scale, the new architecture boosts our training and inference efficiency by 2.5 times. This significant gain enables us to implement a much larger parameter model—thereby enhancing its expressive capabilities—without increasing costs for creators. This approach also leaves ample room for inference optimization. We ultimately expanded the model's total parameter count to 3 times that of its predecessor.

    A larger parameter count and heightened training efficiency mean our model can learn from a more extensive dataset. The wealth of feedback from Hailuo 01 provided invaluable guidance for our model training strategy. As a result, we expanded our training data volume by 4 times, achieving significant improvements in data quality and diversity.

    With this architectural innovation, combined with a threefold increase in parameters and four times the training data, our model has taken a significant leap forward, particularly in its adherence to complex instructions and its rendering of extreme physics. The new model accurately interprets and executes highly detailed prompts, delivering more precise outputs. Furthermore, the efficiency gains from the new architecture also mean we can offer native 1080p video generation at a very affordable price point.

    An early iteration of this model was tested by users on the Artificial Analysis Video Arena, where it secured the second position globally. Stay tuned for an upcoming new version!

    Platform Integration and Versions

    These model enhancements are now fully integrated into the Hailuo Video web platform, mobile application, and our API platform. We currently offer three distinct versions: 768p-6s, 768p-10s, and 1080p-6s. True to our commitment, and thanks to the aforementioned architectural innovation, we continue to offer creators and developers the most open access and affordable pricing in the industry. A comparison of official pricing for different models is detailed below:

    Through sustained technological research and development, coupled with deep collaborations with creators, developers, and artists, our mission and strategic direction have become ever clearer.

    Roadmap and Next Steps

    MiniMax Hailuo 02 represents a new milestone, and we are poised for rapid advancements in the following areas:

    • Enhancing generation speed
    • Improving alignment, leading to higher generation success rates and improved stability
    • Advancing model features beyond Text-to-Video (T2V) and Image-to-Video (I2V)

    And, as always, we remain steadfast in our commitment to relentlessly exploring the upper limits of what technology and art can achieve together.

    Intelligence with Everyone.

    Original source Report a problem

Related vendors