MiniMax Release Notes

Last updated: Mar 5, 2026

  • Mar 4, 2026
    • Date parsed from source:
      Mar 4, 2026
    • First seen by Releasebot:
      Mar 5, 2026
    MiniMax logo

    MiniMax

    Music 2.5+: Unlock instrumental music, break through style boundaries

    MiniMax Music 2.5+ launches instrumental music creation, expanding from song generation to full instrumental scores across classical, electronic, ambient, and ethnic timbres. It enables film TV scoring, ads, and game soundtracks with cross‑style fusion and studio‑quality production.

    MiniMax Music 2.5 Launch

    Today, we are pleased to announce that MiniMax Music 2.5 has officially launched its instrumental music creation capability. MiniMax Music has always centered on song generation. Today, we are extending our capabilities to a more essential form of music 6 instrumental music. No vocals needed; the music itself becomes the expression.

    Unlock All Styles

    MiniMax Music supports diverse generation styles including classical orchestration, minimalism, modern electronic, ambient sounds, and natural soundscapes. It covers the full spectrum from quiet atmospheres to powerful, high-energy tracks, adapting to meditation, sleep aids, advertising, game scoring and other scenarios. MiniMax Music model can handle the complete complexity from "pure natural sound without instruments" to "multi-track instrumental arrangements," with style switching requiring no additional tuning 6generate and use immediately.

    3b5
    Sleep Aid Music
    Prompt4ac A lullaby featuring music box as the primary timbre, with an extremely slow tempo, gentle melody, suitable for falling asleep late at night.

    3b5
    Meditation
    Prompt: Create extremely serene, extremely slow-paced meditation music. The background features soft, water-like flowing synth ambient pads, decorated with crisp Tibetan singing bowls and minimalist xylophone taps. The overall atmosphere is ethereal and deep, as if standing on a temple above the clouds, with no heavy percussion, designed to help listeners achieve deep inner peace.

    3b5
    Natural Soundscape
    Prompt4ac A healing late-night rainstorm, the crisp sound of raindrops hitting the roof and leaves, distant low and gentle thunder, minimalist, white noise.

    3b5
    Advertising / Brand Video Intro
    Prompt4ac A minimalist, tech-inspired brand intro track centered on pulsing synthesizers, precise and restrained in tone.

    3b5
    Game Music
    Prompt: Electric guitar-driven uplifting melody, adding passion to adventure and combat.

    The instrumental music capability also enables MiniMax Music to serve film and TV scoring directly. Films, short drama, documentaries, and TV series each have different scoring requirements. The model generates complete soundtracks matching narrative rhythm based on scene descriptions, covering various emotional types and atmospheric needs.

    3b5
    Film scoring
    Prompt: A minimalist cinematic score driven by pulsing synthesizers, with tight and precise rhythms.

    Cross-Genre Fusion, Unleash Imagination

    Beyond existing styles, MiniMax Music has strong style generalization capabilities, supporting cross-style tag combinations for generation.

    Whether traditional instruments with modern electronic, or Eastern timbres with Western structures, the model can understand the tension between different styles and transform them into coherent musical language, rather than simple element collage.

    This fusion is built on solid musicality 6rich harmonic layers, complete melodic progression with proper beginning, development, transition and conclusion logic, natural transitions from motif development to climax release. The more cross-style the work, the more it demonstrates the model's deep understanding of musical structure. In terms of audio quality, the sound field has distinct three-frequency separation, clear instrument separation with dynamic balance, each track has independent spatial positioning, ensuring professional production standards across different styles.

    It is worth mentioning that MiniMax Music's understanding and reproduction of traditional Chinese musical instruments is at an industry-leading level. MiniMax Music can accurately present the tonal expressiveness and performance details of ethnic instruments such as flute, pipa, and guzheng, naturally integrating them into orchestral arrangements and modern production contexts.

    3b5
    Epic orchestral music
    Prompt4ac Epic cinematic East Asian fusion, 136BPM, virtuosic Chinese bamboo flute (Dizi) leading a powerful orchestra. Intense Taiko drum beats, martial arts atmosphere, heroic and urgent. Dramatic shifts between fierce action and lyrical reflection. High energy, triumphant climax.

    3b5
    Baroque Metal 6 Baroque d7 Hardcore Heavy Metal
    Prompt: A gorgeous auditory metamorphosis. Crisp, rigorous Baroque harpsichord polyphonic melody, suddenly invaded by violent blast beats and heavily distorted heavy metal guitars. Complex classical harmonics perfectly fused with modern metal's aggressiveness, creating a grand and chaotic opera-style metal listening experience.

    3b5
    Chinese Style d7 Fantasy Epic
    Prompt: A Chinese-style pure music depicting an adventure in a vast fantasy world. The music atmosphere is hopeful, led by a retro cello solo melody, accompanied by rhythmic percussion. Overall dynamic range is wide, creating a rich sense of layering.

    Welcome to MiniMax Music 2.5+, unlock your musical creativity!

    Product Experience:
    https://www.minimax.io/audio/music

    API Interface:
    https://platform.minimax.io/docs/api-reference/music-generation

    Original source Report a problem
  • Feb 12, 2026
    • Date parsed from source:
      Feb 12, 2026
    • First seen by Releasebot:
      Feb 18, 2026
    MiniMax logo

    MiniMax

    MiniMax M2.5: Built for Real-World Productivity.

    MiniMax unveils M2.5 a fast low cost frontier model with strong coding and agentic abilities boosted by RL scaling and Forge. It ships two versions for office, finance and software work promising efficient tool use and industry leading speed and cost performance.

    MiniMax-M2.5 Overview

    Today we're introducing our latest model, MiniMax-M2.5.

    Extensively trained with reinforcement learning in hundreds of thousands of complex real-world environments, M2.5 is SOTA in coding, agentic tool use and search, office work, and a range of other economically valuable tasks, boasting scores of 80.2% in SWE-Bench Verified, 51.3% in Multi-SWE-Bench, and 76.3% in BrowseComp (with context management).

    Trained to reason efficiently and decompose tasks optimally, M2.5 exhibits tremendous speed in performing complicated agentic tasks, completing the SWE-Bench Verified evaluation 37% faster than M2.1, matching the speed of Claude Opus 4.6.

    M2.5 is the first frontier model where users do not need to worry about cost, delivering on the promise of intelligence too cheap to meter. It costs just $1 to run the model continuously for an hour at a rate of 100 tokens per second. At 50 tokens per second, the cost drops to $0.30. We hope that the speed and cost effectiveness of M2.5 enable innovative new agentic applications.

    Coding

    In programming evaluations, MiniMax-M2.5 saw substantial improvements compared to previous generations, reaching SOTA levels. The performance of M2.5 in multilingual coding tasks is especially pronounced.

    A significant improvement from previous generations is M2.5's ability to think and plan like an architect. The Spec-writing tendency of the model emerged during training: before writing any code, M2.5 actively decomposes and plans the features, structure, and UI design of the project from the perspective of an experienced software architect.

    M2.5 was trained on over 10 languages (including Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, and Ruby) across more than 200,000 real-world environments. Going far beyond bug-fixing, M2.5 delivers reliable performance across the entire development lifecycle of complex systems: from 0-to-1 system design and environment setup, to 1-to-10 system development, to 10-to-90 feature iteration, and finally 90-to-100 comprehensive code review and system testing. It covers full-stack projects spanning multiple platforms including Web, Android, iOS, and Windows, encompassing server-side APIs, business logic, databases, and more, not just frontend webpage demos.

    To evaluate these capabilities, we also upgraded the VIBE benchmark to a more complex and challenging Pro version, significantly increasing task complexity, domain coverage, and evaluation accuracy. Overall, M2.5 performs on par with Opus 4.5.

    We focused on the model's ability to generalize across out-of-distribution harnesses. We tested performance on the SWE-Bench Verified evaluation set using different coding agent harnesses.

    • On Droid: 79.7(M2.5) > 78.9(Opus 4.6)
    • On OpenCode: 76.1(M2.5) > 75.9(Opus 4.6)

    Search and Tool calling

    Effective tool calling and search are prerequisites for a model's ability to autonomously handle more complex tasks. In evaluations on benchmarks such as BrowseComp and Wide Search, M2.5 achieved industry-leading performance. At the same time, the model's generalization has also improved — M2.5 demonstrates more stable performance when facing unfamiliar scaffolding environments.

    In research tasks performed by professional human experts, using a search engine is only a small part of the process; most of the work involves deep exploration across information-dense webpages. To address this, we built RISE (Realistic Interactive Search Evaluation) to measure a model's search capabilities on real-world professional tasks. The results show that M2.5 excels at expert-level search tasks in real-world settings.

    Compared to its predecessors, M2.5 also demonstrates much better decision-making when handling agentic tasks: it has learned to solve problems with more precise search rounds and better token efficiency. For example, across multiple agentic tasks including BrowseComp, Wide Search, and RISE, M2.5 achieved better results with fewer rounds, using approximately 20% fewer rounds compared to M2.1. This indicates that the model is no longer just getting the answer right, but is also reasoning towards results in more efficient paths.

    Office work

    M2.5 was trained to produce truly deliverable outputs in office scenarios. To this end, we engaged in thorough collaboration with senior professionals in fields such as finance, law, and social sciences. They designed requirements, provided feedback, participated in defining standards, and directly contributed to data construction, bringing the tacit knowledge of their industries into the model's training pipeline. Based on this foundation, M2.5 has achieved significant capability improvements in high-value workspace scenarios such as Word, PowerPoint, and Excel financial modeling. On the evaluation side, we built an internal Cowork Agent evaluation framework (GDPval-MM) that assesses both the quality of the deliverable and the professionalism of the agent's trajectory through pairwise comparisons, while also monitoring token costs across the entire workflow to estimate the model's real-world productivity gains. In comparisons against other mainstream models, it achieved an average win rate of 59.0%.

    Efficiency

    Because the real world is full of deadlines and time constraints, task completion speed is a practical necessity. The time it takes a model to complete a task depends on its task decomposition effectiveness, token efficiency, and inference speed. M2.5 is served natively at a rate of 100 tokens per second, which is nearly twice that of other frontier models. Further, our reinforcement learning setup incentivizes the model to reason efficiently and break down tasks optimally. Due to these three factors, M2.5 delivers a significant time savings in complex task completion.

    For example, when running SWE-Bench Verified, M2.5 consumed an average of 3.52 million tokens per task. In comparison, M2.1 consumed 3.72M tokens. Meanwhile, thanks to improvements in capabilities such as parallel tool calling, the end-to-end runtime decreased from an average of 31.3 minutes to 22.8 minutes, representing a 37% speed improvement. This runtime is on par with Claude Opus 4.6's 22.9 minutes, while the total cost per task is only 10% that of Claude Opus 4.6.

    Cost

    Our goal in designing the M2-series of foundation models is to power complex agents without having to worry about cost. We believe that M2.5 is close to realizing this goal. We’re releasing two versions of the model, M2.5 and M2.5-Lightning, that are identical in capability but differ in speed. M2.5-Lightning has a steady throughput of 100 tokens per second, which is two times faster than other frontier models, and costs $0.3 per million input tokens and $2.4 per million output tokens. M2.5, which has a throughput of 50 tokens per second, costs half that. Both model versions support caching. Based on output price, the cost of M2.5 is one-tenth to one-twentieth that of Opus, Gemini 3 Pro, and GPT-5.

    At a rate of 100 output tokens per second, running M2.5 continuously for an hour costs $1. At a rate of 50 TPS, the price drops to $0.3. To put that into perspective, you can have four M2.5 instances running continuously for an entire year for $10,000. We believe that M2.5 provides virtually limitless possibilities for the development and operation of agents in the economy. For the M2-series, the only problem that remains is how to continually push the frontier of model capability.

    Improvement Rate

    Over the three and a half months from late October to now, we have successively released M2, M2.1, and M2.5, with the pace of model improvement exceeding our original expectations. For instance, in the highly-regarded SWE-Bench Verified benchmark, the rate of progress of the M2-series has been significantly faster than that of peers such as the Claude, GPT, and Gemini model families.

    RL Scaling

    One of the key drivers of the aforementioned developments is the scaling of reinforcement learning. As we train our models, we also benefit from their abilities. Most of the tasks and workspaces that we perform in our company have been made into training environments for RL. To date, there are already hundreds of thousands of such environments. At the same time, we did plenty of work on our agentic RL framework, algorithms, reward signals, and infrastructure engineering to support the continued scaling of our RL training.

    Forge –– Agent-Native RL Framework

    We designed an agent-native RL framework in-house, called Forge, which introduces an intermediary layer that fully decouples the underlying training-inference engine from the agent, supporting the integration of arbitrary agents and enabling us to optimize the model's generalization across agent scaffolds and tools. To improve system throughput, we optimized asynchronous scheduling strategies to balance system throughput against sample off-policyness, and designed a tree-structured merging strategy for training samples, achieving approximately 40x training speedup.

    Agentic RL Algorithm and Reward Design

    On the algorithm side, we continued using the CISPO algorithm we proposed at the beginning of last year to ensure the stability of MoE models during large-scale training. To address the credit assignment challenge posed by long contexts in agent rollouts, we introduced a process reward mechanism for end-to-end monitoring of generation quality. Furthermore, to deeply align with user experience, we evaluated task completion time through agent trajectories, achieving an optimal trade-off between model intelligence and response speed.

    We will release a more comprehensive introduction to RL scaling soon in a separate technical blogpost.

    MiniMax Agent: M2.5 as a Professional Employee

    M2.5 has been fully deployed in MiniMax Agent, delivering the best agentic experience.

    We have distilled core information-processing capabilities into standardized Office Skills deeply integrated within MiniMax Agent. In MAX mode, when handling tasks such as Word formatting, PowerPoint editing, and Excel calculations, MiniMax Agent automatically loads the corresponding Office Skills based on file type, improving the quality of task outputs.

    Furthermore, users can combine Office Skills with domain-specific industry expertise to create reusable Experts tailored to specific task scenarios.

    Take industry research as an example: by merging a mature research framework SOP (standard operating procedure) with Word Skills, the Agent can strictly follow the established framework to automatically fetch data, organize analytical logic, and output properly formatted research reports — rather than merely generating a raw block of text. In financial modeling scenarios, by combining an organization's proprietary modeling standards with Excel Skills, the Agent can follow specific risk control logic and calculation standards to automatically generate and validate complex financial models, rather than simply outputting a basic spreadsheet.

    To date, users have built over 10,000 Experts on MiniMax Agent, and this number is still growing rapidly. MiniMax has also built multiple sets of deeply optimized, ready-to-use Expert suites on MiniMax Agent for high-frequency scenarios such as office work, finance, and programming.

    MiniMax itself has been among the first to benefit from M2.5's capabilities. Throughout the company's daily operations, 30% of overall tasks are autonomously completed by M2.5, spanning functions including R&D, product, sales, HR, and finance — and the penetration rate continues to rise. Performance in coding scenarios has been particularly notable, with M2.5-generated code accounting for 80% of newly committed code.

    Appendix

    Further benchmark results of M2.5:

    Evaluation methods:

    • SWE benchmark: SWE-bench Verified, SWE-bench Multilingual, SWE-bench-pro, and Multi-SWE-bench were tested on internal infrastructure using Claude Code as the scaffolding, with the default system prompt overridden, and results averaged over 4 runs. Additionally, SWE-bench Verified was also evaluated on the Droid and Opencode scaffoldings using the default prompt.
    • Terminal Bench 2: We tested Terminal Bench 2 using Claude Code 2.0.64 as the evaluation scaffolding. We modified the Dockerfiles of some problems to ensure the correctness of the problems themselves, uniformly expanded sandbox specifications to 8-core CPU and 16 GB memory, set the timeout uniformly to 7,200 seconds, and equipped each problem with a basic toolset (ps, curl, git, etc.). While not retrying on timeouts, we added a detection mechanism for empty scaffolding responses, retrying tasks whose final response was empty to handle various abnormal interruption scenarios. Final results are averaged over 4 runs.
    • VIBE-Pro: Internal benchmark. Uses Claude Code as the scaffolding to automatically verify the interaction logic and visual effects of programs. All scores are computed through a unified pipeline that includes a requirements set, containerized deployment, and a dynamic interaction environment. Final results are averaged over 3 runs.
    • BrowseComp: Uses the same agent framework as WebExplorer (Liu et al., 2025). When token usage exceeds 30% of the maximum context, all history is discarded.
    • Wide Search: Uses the same agent framework as WebExplorer (Liu et al., 2025).
    • RISE: Internal benchmark. Contains real questions from human experts, evaluating the model's multi-step information retrieval and reasoning capabilities when combined with complex web interactions. A Playwright-based browser tool suite is added on top of the WebExplorer (Liu et al., 2025) agent framework.
    • GDPval-MM: Internal benchmark. Based on the open-source GDPval test set, using a custom agentic evaluation framework where an LLM-as-a-judge performs pairwise win/tie/loss judgments on complete trajectories. Average token cost per task is calculated based on each vendor's official API pricing (without caching).
    • MEWC: Internal benchmark. Built on MEWC (Microsoft Excel World Championship), comprising 179 problems from the main and other regional divisions of Excel esports competitions from 2021–2026. It evaluates the model's ability to understand competition Excel spreadsheets and use Excel tools to complete problems. Scores are calculated by comparing output and answer cell values one by one.
    • Finance Modeling: Internal benchmark. Primarily contains financial modeling problems constructed by industry experts, involving end-to-end research and analysis tasks performed via Excel tools. Each problem is scored using expert-designed rubrics. Final results are averaged over 3 runs.
    • AIME25 ~ AA-LCR: Obtained through internal testing based on the public evaluation sets and evaluation methods covered by the Artificial Analysis Intelligence Index leaderboard.
    Original source Report a problem
  • All of your release notes in one feed

    Join Releasebot and get updates from MiniMax and hundreds of other software products.

  • January 2026
    • No date parsed from source.
    • First seen by Releasebot:
      Jan 16, 2026
    MiniMax logo

    MiniMax

    MiniMax-M2.1: Polyglot programming mastery, precision code refactoring

    MiniMax now integrates with the Anthropic API ecosystem, letting developers plug in with an easy SDK setup and shared prompts. With supported models, streaming options, and clear config steps, you can deploy cross‑ecosystem AI quickly.

    • Install Anthropic SDK
    pip install anthropic
    
    • Configure Environment Variables

    For international users, use https://api.minimax.io/anthropic; for users in China, use https://api.minimaxi.com/anthropic

    export ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic
    export ANTHROPIC_API_KEY=${YOUR_API_KEY}
    
    • Call API

    Python example:

    import anthropic
    
    client = anthropic.Anthropic()
    
    message = client.messages.create(
        model = "MiniMax-M2.1",
        max_tokens = 1000,
        system = "You are a helpful assistant.",
        messages = [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Hi, how are you?"
                    }
                ]
            }
        ]
    )
    
    for block in message.content:
        if block.type == "thinking":
            print(f"Thinking:\n{block.thinking}\n")
        elif block.type == "text":
            print(f"Text:\n{block.text}\n")
    
    • Important Note

    In multi-turn function call conversations, the complete model response (i.e., the assistant message) must be append to the conversation history to maintain the continuity of the reasoning chain.

    • Append the full response.content list to the message history (includes all content blocks: thinking/text/tool_use)

    • Supported Models

    When using the Anthropic SDK, the MiniMax-M2.1, MiniMax-M2.1-lightning, MiniMax-M2 model is supported:

    Model Name Description MiniMax-M2.1 Powerful Multi-Language Programming Capabilities with Comprehensively Enhanced Programming Experience (output speed approximately 60 tps) MiniMax-M2.1-lightning Faster and More Agile (output speed approximately 100 tps) MiniMax-M2 Agentic capabilities, Advanced reasoning

    Note: The Anthropic API compatibility interface currently only supports the MiniMax-M2.1, MiniMax-M2.1-lightning, MiniMax-M2 model. For other models, please use the standard MiniMax API interface.

    • Compatibility

    • Supported Parameters

    When using the Anthropic SDK, we support the following input parameters:

    Parameter Support Status Description model Fully supported supports MiniMax-M2.1 MiniMax-M2.1-lightning MiniMax-M2 model messages Partial support Supports text and tool calls, no image/document input max_tokens Fully supported Maximum number of tokens to generate stream Fully supported Streaming response system Fully supported System prompt temperature Fully supported Range (0.0, 1.0], controls output randomness, recommended value: 1 tool_choice Fully supported Tool selection strategy tools Fully supported Tool definitions top_p Fully supported Nucleus sampling parameter metadata Fully Supported Metadata thinking Fully Supported Reasoning Content top_k Ignored This parameter will be ignored stop_sequences Ignored This parameter will be ignored service_tier Ignored This parameter will be ignored mcp_servers Ignored This parameter will be ignored context_management Ignored This parameter will be ignored container Ignored This parameter will be ignored
    • Messages Field Support
    Field Type Support Status Description type="text" Fully supported Text messages type="tool_use" Fully supported Tool calls type="tool_result" Fully supported Tool call results type="thinking" Fully supported Reasoning Content type="image" Not supported Image input not supported yet type="document" Not supported Document input not supported yet
    • Examples

    • Streaming Response

    Python example:

    import anthropic
    
    client = anthropic.Anthropic()
    
    print("Starting stream response...\n")
    print("="*60)
    print("Thinking Process:")
    print("="*60)
    
    stream = client.messages.create(
        model = "MiniMax-M2.1",
        max_tokens = 1000,
        system = "You are a helpful assistant.",
        messages = [
            {
                "role": "user",
                "content": [{"type": "text", "text": "Hi, how are you?"}]
            }
        ],
        stream = True,
    )
    
    reasoning_buffer = ""
    text_buffer = ""
    
    for chunk in stream:
        if chunk.type == "content_block_start":
            if hasattr(chunk, "content_block") and chunk.content_block:
                if chunk.content_block.type == "text":
                    print("\n" + "="*60)
                    print("Response Content:")
                    print("="*60)
        elif chunk.type == "content_block_delta":
            if hasattr(chunk, "delta") and chunk.delta:
                if chunk.delta.type == "thinking_delta":
                    # Stream output thinking process
                    new_thinking = chunk.delta.thinking
                    if new_thinking:
                        print(new_thinking, end = "", flush = True)
                    reasoning_buffer += new_thinking
                elif chunk.delta.type == "text_delta":
                    # Stream output text content
                    new_text = chunk.delta.text
                    if new_text:
                        print(new_text, end = "", flush = True)
                    text_buffer += new_text
    print("\n")
    
    • Important Notes
    1. The Anthropic API compatibility interface currently only supports the MiniMax-M2.1, MiniMax-M2 model
    2. The temperature parameter range is (0.0, 1.0], values outside this range will return an error
    3. Some Anthropic parameters (such as thinking, top_k, stop_sequences, service_tier, mcp_servers, context_management, container) will be ignored
    4. Image and document type inputs are not currently supported
    Original source Report a problem
  • December 2025
    • No date parsed from source.
    • First seen by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MiniMax

    MiniMax-M2.1

    MiniMax-M2.1 launches a polyglot text generation API with buildable tool calls, accessible via HTTP or SDKs. It supports ultra large context windows up to 204,800 tokens and emphasizes code understanding and interleaved tool use.

    🎉 MiniMax-M2.1: Polyglot programming mastery, precision code refactoring

    The text generation API uses MiniMax M2.1 to generate conversational content and trigger tool calls based on the provided context.

    It can be accessed via HTTP requests, the Anthropic SDK (Recommended), or the OpenAI SDK.

    Supported Models

    Model Name Context Window (total input + output per request) MiniMax-M2.1 MiniMax-M2.1-lightning 204,800 MiniMax-M2 204,800

    Please note: The maximum token count refers to the total number of input and output tokens.

    Recommended Reading

    • Compatible Anthropic API (Recommended): Use Anthropic SDK with MiniMax models
    • Compatible OpenAI API: Use OpenAI SDK with MiniMax models
    • M2.1 for AI Coding Tools: MiniMax-M2.1 excels at code understanding, dialogue, and reasoning.
    • M2.1 Tool Use & Interleaved Thinking: AI models can call external functions to extend their capabilities.
    Original source Report a problem
  • December 2025
    • No date parsed from source.
    • First seen by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MiniMax

    MiniMax-M2.1: Polyglot programming mastery, precision code refactoring

    MiniMax-M2.1 launches polyglot video generation from text or images with new models boosting realism and speed. The release outlines an asynchronous API flow to create, track, and download videos via task and file IDs.

    🎉 MiniMax-M2.1: Polyglot programming mastery, precision code refactoring ➔

    This API supports generating videos based on user-provided text, images (including first frame, last frame, or reference images).

    Supported Models

    • MiniMax-Hailuo-2.3: New video generation model, breakthroughs in body movement, facial expressions, physical realism, and prompt adherence.
    • MiniMax-Hailuo-2.3-Fast: New Image-to-video model, for value and efficiency.
    • MiniMax-Hailuo-02: Video generation model supporting higher resolution (1080P), longer duration (10s), and stronger adherence to prompts.

    API Usage Guide

    Video generation is asynchronous and consists of three APIs: Create Video Generation Task, Query Video Generation Task Status, and File Management. Steps are as follows:

    • Use the Create Video Generation Task API: (Text to Video, Image to Video, Start / End to Video, Subject Reference to Video) to start a task. On success, it will return a task_id.
    • Use the Query Video Generation Task Status API with the task_id to check progress. When the status is success, a file ID (file_id) will be returned.
    • Use the Download the Video File API with the file_id from step 2 to view and download the generated video.

    Official MCP

    Visit the official MCP for more capabilities: https://github.com/MiniMax-AI/MiniMax-MCP

    Original source Report a problem
  • December 2025
    • No date parsed from source.
    • First seen by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MiniMax

    MiniMax-M2.1: Polyglot programming mastery, precision code refactoring

    New Image Generation service introduces Text-to-Image and Image-to-Image capabilities. Generate images from detailed prompts or from reference images to preserve subject characteristics and maintain visual identity across contexts.

    The Image Generation service provides two core capabilities: Text-to-Image and Image-to-Image.

    Generate Images from Text

    Create images directly from detailed text descriptions (prompts) that specify the desired content.

    Generate Images with Reference Images

    This feature allows you to supply one or more reference images (including online image URLs) that contain a clear subject. Combined with a text prompt, the service generates a new image that preserves the subject’s key characteristics.
    This is particularly useful for scenarios that require consistent visual identity, such as generating images of the same virtual character in different contexts.

    Original source Report a problem
  • December 2025
    • No date parsed from source.
    • First seen by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MiniMax

    MiniMax-M2.1: Polyglot programming mastery, precision code refactoring

    Music Generation API now lets you generate full songs with vocals from text prompts and lyrics. Define style, mood, tempo, and vocal traits to craft ready-to-use tracks for videos, games, or apps. Aimed at quick, theme-driven music creation.

    The Music Generation API

    The Music Generation API can create a complete song with vocals based on a text description and lyrics.

    Use the prompt parameter to define the music’s style, mood, and scenario, and the lyrics parameter to provide the vocal content.
    This feature is ideal for quickly generating unique theme songs for videos, games, or applications.

    Example: Text-to-Music Creation

    import requests
    import os
    
    url = "https://api.minimax.io/v1/music_generation"
    api_key = os.environ["MINIMAX_API_KEY"]
    headers = {
      "Authorization": f"Bearer {api_key}"
    }
    
    payload = {
      "model": "music-2.0",
      "prompt": "This is a contemporary R&B/Pop track with distinct Trap influences, radiating a confident, assertive, and empowered energy. It features a bright, clear, and agile female vocal with a polished and heavily processed modern sound. The singer's rhythmic and confident delivery is defined by the heavy and stylistic use of Auto-Tune, creating its signature character. Extensive backing vocals, including layered harmonies and ad-libs built upon stacked unison vocals, produce a rich and full texture, enhanced by moderate reverb for a spacious feel. Set at a tempo of 80 BPM, the arrangement is driven by a dominant 808 bassline and electronic drums with intricate hi-hat patterns and sharp claps, while atmospheric synth pads and subtle sound effects craft a dynamic backdrop. This track is perfect for clubbing, parties, driving with the windows down, or a workout session, making it an essential addition to any confidence-boosting playlist.",
      "lyrics": "[chorus]\nSummit, i reached the summit\nI'm the peak with the fire, they all want from it\nSpill a bit of my glow, like a comet\nI ain't worried 'bout hills, you just plummet\nSummit, i reached the summit\nObsidian shards 'round my throat, now they run from it\nAin't no wonder why the valleys all run from it\nI'm awake, from the summit\n[verse]\nI know what i hold\nAnd i'm about to erupt, yeah\nA story untold, yeah\nI know you won't interrupt it\nKeep your eyes on the rise, no surprise that i'm bright\nGot one stream for the sea, other stream for the night\nI be flowin', you're erodin'\nSwear you're slowin', i'm explodin'\nPressure's growin', growin', growin'\n[interlude]\nSummit, i reached the summit\nI'm the peak with the fire, they all want from it\nSpill a bit of my glow, like a comet\nI ain't worried 'bout stone\n[verse]\nI ain't worried 'bout nada\nUnless it's new earth, unless it's magma\nUnless it's deep core, a new nirvana\nUnless it's shaping a new savanna\nI wanna feel like i'm mother gaia\nI wanna feel like i'm way up\nRumbling, grumbling 'til the world pay up\nMade another island, no layups\nStay hot every single day i wake up\n[chorus]\nSummit, i reached the summit\nI'm the peak with the fire, they all want from it\nSpill a bit of my glow, like a comet\nI ain't worried 'bout hills, you just plummet\nSummit, i reached the summit\nObsidian shards 'round my throat, now they run from it\nAin't no wonder why the valleys all run from it\nI'm awake, from the summit\n[outro]\nSummit\nRooo-ar",
      "audio_setting": {
        "sample_rate": 44100,
        "bitrate": 256000,
        "format": "mp3"
      }
    }
    
    response = requests.post(url, headers = headers, json = payload)
    response.raise_for_status()
    audio_hex = response.json()["data"]["audio"]
    
    with open("output.mp3", "wb") as f:
        f.write(bytes.fromhex(audio_hex))
    
    Original source Report a problem
  • Dec 23, 2025
    • Date parsed from source:
      Dec 23, 2025
    • First seen by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MiniMax

    MiniMax M2.1: Significantly Enhanced Multi-Language Programming, Built for Real-World Complex Tasks

    MiniMax M2.1 unleashes AI-native development with stronger multi-language coding, improved office task automation, and enhanced mobile Web/App capabilities. It promises faster, cheaper, more capable AI workflows and opens the model to open-source deployment and public tools.

    MiniMax M2.1 Release

    MiniMax has been continuously transforming itself in a more AI-native way. The core driving forces of this process are models, Agent scaffolding, and organization. Throughout the exploration process, we have gained increasingly deeper understanding of these three aspects. Today we are releasing updates to the model component, namely MiniMax M2.1, hoping to help more enterprises and individuals find more AI-native ways of working (and living) sooner.

    In M2, we primarily addressed issues of model cost and model accessibility. In M2.1, we are committed to improving performance in real-world complex tasks: focusing particularly on usability across more programming languages and office scenarios, and achieving the best level in this domain.

    Key Highlights of MiniMax M2.1:

    • Exceptional Multi-Programming Language Capabilities
      Many models in the past primarily focused on Python optimization, but real-world systems are often the result of multi-language collaboration.
      In M2.1, we have systematically enhanced capabilities in Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, and other languages. The overall performance on multi-language tasks has reached industry-leading levels, covering the complete chain from low-level system development to application layer development.
    • WebDev and AppDev: A Comprehensive Leap in Capability and Aesthetics
      Addressing the widely recognized weakness in mobile development across the industry, M2.1 significantly strengthens native Android and iOS development capabilities.
      Meanwhile, we have systematically enhanced the model's design comprehension and aesthetic expression in Web and App scenarios, enabling excellent construction of complex interactions, 3D scientific scene simulations, and high-quality visualization, making vibe coding a sustainable and deliverable production practice.
    • Enhanced Composite Instruction Constraints, Enabling Office Scenarios
      As one of the first open-source model series to systematically introduce Interleaved Thinking, M2.1's systematic problem-solving capabilities have been further upgraded. The model not only focuses on code execution correctness but also emphasizes integrated execution of "composite instruction constraints," providing higher usability in real office scenarios.
    • More Concise and Efficient Responses
      Compared to M2, MiniMax-M2.1 delivers more concise model responses and thought chains. In practical programming and interaction experiences, response speed has significantly improved and token consumption has notably decreased, resulting in smoother and more efficient performance in AI Coding and Agent-driven continuous workflows.
    • Outstanding Agent/Tool Scaffolding Generalization Capabilities
      M2.1 demonstrates excellent performance across various programming tools and Agent frameworks. It exhibits consistent and stable results in tools such as Claude Code, Droid (Factory AI), Cline, Kilo Code, Roo Code, and BlackBox, while providing reliable support for Context Management mechanisms including Skill.md, Claude.md/agent.md/cursorrule, and Slash Commands.
    • High-Quality Dialogue and Writing
      M2.1 is no longer just "stronger in coding capabilities." In everyday conversation, technical documentation, and writing scenarios, it also provides more detailed and structured responses.

    First Impressions

    "We're excited for powerful open-source models like M2.1 that bring frontier performance (and in some cases exceed the frontier) for a wide variety of software development tasks. Developers deserve choice, and M2.1 provides that much needed choice!"

    • Eno Reyes, Co-Founder, CTO of Factory AI

    “MiniMax M2.1 performed exceptionally well across our internal benchmarks, showing strong results in complex instruction following, reranking, and classification, especially within e-commerce tasks. Beyond its general versatility, it has proven to be an excellent model for coding. We are impressed by these results and look forward to a close collaboration with the MiniMax team as we continue to support their latest innovations on the Fireworks platform.”

    • Benny Chen, Co-founder of Fireworks

    “Minimax M2 series has demonstrated powerful code generation capability, and has quickly became one of the most popular model on Cline platform during the past few months. We already see another huge advancement in capability for M2.1 and very excited to continue partner with minimax team to advance AI in coding”

    • Saoud Rizwan, Founder, CEO of Cline

    “We could not be more excited about M2.1! Our users have come to rely on MiniMax for frontier-grade coding assistance at a fraction of the cost, and early testing shows M2.1 excelling at everything from architecture and orchestration to code reviews and deployment. The speed and efficiency are off the charts!”

    • Scott Breitenother, Co-Founder, CEO of Kilo

    "Our users love MiniMax M2 for its strong coding ability and efficiency. The latest M2.1 release builds on that foundation with meaningful improvements in speed and reliability, performing well across a wider range of languages and frameworks. It's a great choice for high-throughput, agentic coding workflows where speed and affordability matter."

    • Matt Rubens, Co-Founder, CEO of RooCode

    “Integrating the MiniMax M2 series into our platform has been a significant win for our users, and M2.1 represents a clear step forward in what a coding-specific model can achieve. We’ve found that M2.1 handles the nuances of complex, multi-step programming tasks with a level of consistency that is rare in this space. By providing high-quality reasoning and context awareness at scale, MiniMax has become a core component of how we help developers solve challenging problems faster. We look forward to seeing how our community continues to leverage these updated capabilities.”

    • Robert Rizk, Co-Founder, CEO of BlackBox

    Benchmarks

    MiniMax-M2.1 delivers a significant leap over M2 on core software engineering leaderboards. It shines particularly bright in multilingual scenarios, where it outperforms Claude Sonnet 4.5 and closely approaches Claude Opus 4.5.

    We also evaluated MiniMax-M2.1 on SWE-bench Verified across a variety of coding agent frameworks. The results highlight the model's exceptional framework generalization and robust stability.
    Furthermore, across specific benchmarks—including test case generation, code performance optimization, code review, and instruction following—MiniMax-M2.1 demonstrates comprehensive improvements over M2. In these specialized domains, it consistently matches or exceeds the performance of Claude Sonnet 4.5.

    To evaluate the model's full-stack capability to architect complete, functional applications "from zero to one," we established a novel benchmark: VIBE (Visual & Interactive Benchmark for Execution). This suite encompasses five core subsets: Web, Simulation, Android, iOS, and Backend. Distinguishing itself from traditional benchmarks, VIBE leverages an innovative Agent-as-a-Verifier (AaaV) paradigm to automatically assess the interactive logic and visual aesthetics of generated applications within a real runtime environment.
    MiniMax-M2.1 delivers outstanding performance on the VIBE aggregate benchmark, achieving an average score of 88.6—demonstrating robust full-stack development capabilities. It excels particularly in the VIBE-Web (91.5) and VIBE-Android (89.7) subsets.
    MiniMax-M2.1 also demonstrates steady improvements over M2 in both long-horizon tool use and comprehensive intelligence metrics.

    Showcases

    • Multilingual Coding

    • 3D Interactive Animation
      MiniMax M2.1 built a "3D Dreamy Christmas Tree" based on React Three Fiber and InstancedMesh, successfully rendering over 7,000 instances. It supports gesture interaction and complex particle animation, demonstrating advanced 3D rendering capabilities.
      Try it out: https://yuyl27wq92.space.minimax.io/

    • Avant-Garde Web UI Design
      M2.1 generated a minimalist photographer's personal homepage using an asymmetrical layout and a black-white-red contrasting color scheme. By combining immersive imagery with brutalist typography, it achieved a high-impact visual effect.
      Try it out: https://m6xkaf07udss.space.minimax.io/

    • Website - Skincare Brand
      M2.1 designed a landing page for a high-end organic skincare brand. Adopting a "Clean & Minimalist" style, it accurately presented the brand's premium identity and international visual appeal.
      Try it out: https://2drpfocv00n9.space.minimax.io/

    • Web 3D Lego Sandbox
      M2.1 developed a high-freedom 3D brick building application based on Three.js, implementing precise grid snapping algorithms and collision detection mechanisms. The project perfectly replicates the glossy texture of plastic bricks, supporting multi-angle rotation, drag-and-drop assembly, and instant color switching, providing users with an immersive 3D creative building experience.
      Try it out: https://8e6nunemyuzh.space.minimax.io/

    • Native App Development - Android
      M2.1 used Kotlin to develop a native Android gravity sensor simulator. Utilizing the gyroscope for a silky-smooth control experience, it features clever visual easter eggs that elegantly present the "MERRY XMAS MiniMax M2.1" message through natural UI transitions and collision effects.

    • Native App Development - iOS
      M2.1 wrote an interactive iOS Home Screen widget, designing a "Sleeping Santa" click-to-wake mechanism. The logic is complete with native-level animation effects—Santa lives in your widget; tap him ten times to wake him up for a surprise! 🎅🎁

    • Web Audio Simulation Development
      M2.1 developed a 16-step drum machine simulator based on the Web Audio API. It integrates synthesized drum sounds, non-linear rhythm algorithms, and real-time glitch sound effects, providing an avant-garde electronic music experience! (Turn on the sound in the video below to listen!)
      Try it out: https://21okxwno2u.space.minimax.io

    • Rust TUI
      M2.1 built a powerful Linux security audit tool with dual CLI + TUI modes using Rust, supporting one-click low-level scanning and intelligent risk rating for critical items such as processes, networks, and SSH.

    • Python Data Dashboard
      M2.1 created a Web3 cryptocurrency price dashboard in the style of The Matrix. Use Python (backend for real-time price API fetching), HTML (structure), and CSS (Matrix aesthetic: green digital rain on black background, monospaced font, glowing neon green text, terminal-like UI).

    • C++ Image Rendering
      M2.1 utilized C++ and GLSL to implement complex light transport algorithms, accurately rendering the physical refraction of a crystal ball, detailed SDF modeling of a snowman, and shimmering snow effects in a real-time environment.

    • Java Real-time Danmaku
      M2.1 implemented a high-performance real-time Danmaku (bullet chat) system based on Java, a clean and intuitive user interface, and millisecond-level response capabilities.

    • SVG Generation
      M2.1 generated an interactive isometric SVG island map, constructing a detailed miniature world that supports one-click zooming to freely explore four major themed areas.
      Try it out: https://08tmc3aada59.space.minimax.io/

    • Agentic Tool Use
      Tool Use Capability: Excel Market Research
      M2.1 demonstrated its tool-use capabilities by autonomously invoking Excel and Yahoo Finance to complete an end-to-end task, ranging from market research data cleaning and analysis to chart generation.

    • Digital Employee
      The "Digital Employee" is a key feature of the MiniMax M2.1 model. M2.1 accepts web content presented in text form and controls mouse clicks and keyboard inputs via text-based commands. It can complete end-to-end tasks in daily office scenarios across administration, data science, finance, human resources, and software development. The following demo video is a screen recording of M2.1's behavioral trajectory in the Agent Company Benchmark.

    • End-to-End Office Automation
      Demo 1: Administrative tasks
      Task Requirements: Proactively collect employees' equipment requests on communication software, then search for relevant documents on the enterprise's internal server to obtain equipment prices, calculate the total cost and determine whether the department budget is sufficient, and then record equipment changes.

      Demo 2: Project management tasks
      Task Requirements: Search for blocked or backlogged issues on the project management software, then find relevant employees on the communication software and consult them for solutions, and update the status of the issues based on the employees' feedback.

      Demo 3: Software development tasks
      Task Requirements: A colleague wants to know which is the most recent Merge Request that modified a certain file. Search for the relevant Merge Request, find its number, and inform the colleague.

    How to Use

    Local Deployment Guide

    Download the model from HuggingFace repository
    We recommend using the following inference frameworks (listed alphabetically) to serve the model:

    • SGLang
      We recommend using SGLang to serve MiniMax-M2.1. Please refer to our SGLang Deployment Guide.

    • vLLM
      We recommend using vLLM to serve MiniMax-M2.1. Please refer to our vLLM Deployment Guide.

    • Other Inference Engines

      • MLX
      • KTransformers

    Inference Parameters

    We recommend using the following parameters for best performance:
    temperature=1.0, top_p = 0.95, top_k = 40

    Tool Calling Guide

    Please refer to our Tool Calling Guide.

    Contact Us

    Original source Report a problem
  • Oct 31, 2025
    • Date parsed from source:
      Oct 31, 2025
    • First seen by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MiniMax

    MiniMax Music 2.0

    MiniMax Music 2.0 launches with dynamic vocals, precise instrument control, and professional-grade audio. Create complete songs up to five minutes, perform versatile singing styles and duets, and turn prompts into polished, film-grade soundscapes.

    Today, we are officially launching our latest-generation music model—MiniMax Music 2.0. This version represents a true leap forward in the model's understanding and expression of music. It can accurately capture and reproduce everything from the subtle emotions of the human voice to the dynamic tension of musical instruments.

    It understands rhythm and emotion, weaving together vocals and instruments to become the ultimate "singing producer."

    From now on, expressing yourself through music is no longer a privilege for the few, but a joy accessible to everyone.

    Turn your inspiration into flowing melodies. Feel the rhythm, let the music belong to you.

    1. Dynamic Vocals with Mastery Over Diverse Singing Styles

    You don't need professional vocal training to sing the melody in your heart with the voice, technique, and style you desire.

    In terms of vocal texture, Music 2.0 produces a timbre that is incredibly close to the real human voice. The model performs like a seasoned "vocal powerhouse," capable of mastering a wide range of singing techniques and emotional styles. Its nuanced handling of phrasing, rhythm, and breath demonstrates a "musical intuition" comparable to a professional singer.

    The model supports precise control over vocal timbre. Using prompts, you can maintain a core vocal identity while switching between different singing styles, allowing one voice to have a thousand variations. The AI can transform into a "versatile vocal artist."

    The same female voice can switch effortlessly between Jump Blues, Rock, and Electronic styles.

    Beyond popular genres like Pop, Jazz, Blues, Rock, and Folk, the model also supports male-female duets, a cappella, and more.

    Achieve a dynamic duet with a conversational feel and varied intensity through seamless transitions between male and female lead vocals.

    Create rich melodies even without instrumental accompaniment.

    2. Catchy Melodies and Precise Instrument Control

    You don't have to be a music arranger to compose your own complete musical piece.

    Building on the strengths of its predecessor, Music 2.0 generates structurally complete songs with clear logic, including verses, choruses, and bridges, with a potential length of up to five minutes. Furthermore, the new model creates melodies that are more memorable and instantly captivating.

    The hook's melody is easy to remember, mirroring the melodic habits of human composers.

    The model can follow specific instructions to independently control and adjust various instruments in the accompaniment, creating layered, rich arrangements with a natural groove across different styles.

    Experience a live masterclass in jazz as the saxophone, trombone, trumpet, jazz drums, and piano enter in perfect sequence.

    3. Professional-Grade Audio Experience

    The new model delivers a comprehensive upgrade in audio quality. Both the texture of the vocal tracks and the spatial presence of the instruments are enhanced, providing you with an immersive listening experience.

    Step into a retro disco with vibrant vocal performances and classic 80s instrumentation that will transport you back to the golden age of dance.

    One More Thing

    While testing Music 2.0, we made a surprising discovery: you can use prompts to describe vocal emotions and soundscapes with precision to generate film-grade monologue soundtracks. The layered emotional progression and musical development create a vivid picture you can "hear."

    This exciting capability stems from the model's accurate semantic understanding combined with its precise control over vocal expression—a perfect fusion that gives sound a versatile emotional contour.

    Music 2.0 is now live.
    Start creating and discover your own sound:
    https://www.minimax.io/audio/music

    Intelligence with Everyone.

    Original source Report a problem
  • Oct 30, 2025
    • Date parsed from source:
      Oct 30, 2025
    • First seen by Releasebot:
      Dec 23, 2025
    MiniMax logo

    MiniMax

    MiniMax Speech 2.6: The Ultimate Voice Agent Has Arrived

    MiniMax Speech 2.6 launches with under 250 ms end-to-end latency, smarter handling of non standard formats, and Fluent LoRA for natural, multi language voices. It enables faster, more fluid voice interactions across real world platforms and devices.

    MiniMax Speech 2.6 Release Notes

    Today, we’re thrilled to introduce MiniMax Speech 2.6 — our latest speech model, bringing comprehensive upgrades with ultra-low latency, enhanced format handling, and a more natural, human-like voice for Voice Agent scenarios.

    Since its launch, MiniMax Speech has become a core piece of infrastructure in the global voice intelligence landscape, known for its outstanding speech technology and exceptional cost-effectiveness.

    From LiveKit, which powers ChatGPT's advanced voice mode, and the popular open-source framework Pipecat on GitHub, to the YC-incubated voice platform Vapi, all have chosen MiniMax Speech as their underlying technology engine. In the smart hardware sector, innovative products like Haivivi Bubble Pal, Fuzozo, and Rokid Glasses are also powered by MiniMax Speech to deliver their natural voice interaction experiences.

    MiniMax continues to drive new forms of productivity through technological innovation, breaking down the barriers of language and culture to deliver natural, fluent interactions that connect every voice around the world.

    Ultra-Low Latency, More Responsive: For Smoother Overall Interaction

    We have completely optimized the audio generation pipeline, achieving an end-to-end latency of under 250 milliseconds—a top-tier industry standard. In scenarios with strict response time requirements, such as real-time conversations, audio generation is no longer the bottleneck, ensuring a smoother overall interaction.

    Listen to Speech 2.6 acting as an AI customer service agent:

    Seamless Handling of Specialized Formats, Smarter: For More Fluid Information Delivery

    Speech 2.6 now directly converts non-standard text formats in multiple languages, including URLs, email addresses, phone numbers, dates, and monetary amounts. Whether you are using it with a large language model or need to process dynamically changing entity information in your business, you no longer need to perform tedious text pre-processing. The input is read correctly from the start, enabling more fluid information delivery.

    For example, to correctly read the following passage, traditional TTS would require a series of conversions:

    • +1 415 415 9921 → “plus one, four one five, four one five, nine nine two one”
    • $1,234.56 → “one thousand two hundred thirty-four dollars and fifty-six cents”
    • 192.168.1.1 → “one nine two dot one six eight dot one dot one”
    • 2032-5-6 → “May sixth, twenty thirty-two”
    • [email protected] → “support dash vip at technet dot com”

    Original Text:
    "Hello Oliver Smith, I'm your intelligent virtual assistant Max! Thank you for your call. I've found your file. The outstanding balance for the phone number +1 415 415 9921 is $1,234.56. The associated IP address is 192.168.1.1. Your next payment is due on 2032-5-6. If you have any questions, please contact [email protected]."

    Greater Naturalness and Fluent LoRA: For More Fluent Vocal Expression

    In addition to further enhancing prosodic naturalness, Speech 2.6 also introduces Fluent LoRA.

    Speech 2.5 already offered a convenient, high-fidelity voice cloning feature that allowed users to preserve the unique characteristics of the original voice, such as accents and speech habits. This capability met the diverse voice needs of real-world application scenarios.

    Now, you no longer have to worry about imperfect source material when cloning a voice. Even with non-native recordings that may have an accent or be disfluent, Fluent LoRA can perfectly replicate the voice's timbre while generating fluent, natural speech that matches the target text, making your vocal expression more articulate.

    Besides the English example shown in the video, this feature enables one-click fluency for voice cloning across the 40+ languages the model supports. Here is an example in a Japanese scenario:

    Speech 2.6 is now fully live. Welcome to try it out:

    MiniMax Open Platform:
    https://www.minimax.io/platform_overview

    MiniMax Audio:
    https://www.minimax.io/audio

    Intelligence with Everyone.

    Original source Report a problem

Related vendors