Minimaxi Release Notes

Last updated: Jan 4, 2026

  • Jan 4, 2026
    • Parsed from source:
      Jan 4, 2026
    • Detected by Releasebot:
      Jan 4, 2026
    Minimaxi logo

    Minimaxi

    M2.1: Multilingual and Multi-Task Coding with Strong Generalization

    MiniMax-M2.1 delivers a multi‑language, multi‑task coding agent with strong scaffold generalization and top‑tier benchmarks, signaling a practical leap toward enterprise coding, testing, and collaboration. The release outlines scalable RL training, broader problem coverage, and a bold roadmap for future efficiency and scope.

    The Gap Between SWE-Bench and Real-World Coding

    In 2025, SWE-Bench has become the most authoritative evaluation standard for code generation scenarios. In this evaluation, LLMs must face bugs from real GitHub repositories and fix them through multiple rounds of code reading and testing. The core value of SWE-Bench lies in the fact that the tasks it evaluates are highly close to a programmer's daily work, and the results can be objectively verified via test cases — a feature particularly crucial for reinforcement learning training. We can directly use the test pass rate as a reward signal, continuously optimizing the model in a real code environment without relying on the noise introduced by human labeling or model evaluation.

    However, like all evaluation standards, SWE-Bench is not perfect. For a coding agent to be usable in real-world scenarios, there are more capability dimensions beyond SWE-Bench that need attention:

    • Limited Language Coverage: SWE-Bench currently only covers Python. In real development scenarios, developers need to handle multiple languages such as Java, Go, TypeScript, Rust, and C++, often collaborating across multiple languages within the same project.
    • Restricted Task Types: SWE-Bench only involves bug-fixing tasks. Other real-world capabilities, such as implementing new features, generating test cases, project refactoring, code review, performance optimization, and CI/CD configuration can't be evaluated.
    • Scaffold Binding: SWE-Bench usually only evaluates the model's performance on a specific scaffold, so the model's generalization on other scaffolds cannot be accurately observed. Meanwhile, different agent scaffolds design various context management strategies, and the model needs to be able to adapt to these differences.

    How to Fill These Gaps

    • Environment Scaling

      We often see developers complaining that current coding agents perform well on languages like Python/JavaScript but show lackluster results in more serious enterprise-level development scenarios. If the task involves complex project understanding, the performance degrades further.

      To solve this problem, during the training cycle of MiniMax-M2.1, we built a comprehensive data pipeline covering Top 10+ mainstream programming languages. We retrieved a massive number of Issues, PRs, and corresponding test cases from GitHub, and conducted strict filtering, cleaning, and rewriting based on this raw data to ensure the quality of Post Training data. A coding agent is naturally suited for mass-producing this kind of training environment. During this process, we found that for both the M2 model and other frontier models, the success rate of constructing multi-language environments was lower than that of Python. There are several distinct situations here:

      • Environmental Complexity of Compiled Languages: Python, as an interpreted language, has relatively simple configuration. However, for compiled languages like Java, Go, Rust, and C++, we need to handle complex compilation toolchains, version compatibility, and cross-compilation issues. A Java project might depend on a specific version of JDK, Maven/Gradle, and numerous third-party libraries; an error in any link can lead to build failure.
      • Diverse Test Frameworks: In the Python ecosystem, pytest dominates, but test frameworks in other languages are more fragmented. Java has JUnit and TestNG; JavaScript has Jest, Mocha, and Vitest; Go has the built-in testing package but also extensions like testify; Rust has built-in tests and criterion, etc. We need to design specialized test execution and result parsing logic for each framework.
      • Dependency Management & Project Structure: Package managers for different languages differ vastly in dependency resolution, version locking, and private repository support. The nested structure of npm's node_modules, Maven's central repository mechanism, and Cargo's semantic versioning all require targeted handling. Simultaneously, project structure standards vary: Python structures are flexible, but Java projects usually follow strict Maven/Gradle directory standards; Go projects have GOPATH and Go Modules modes; Rust projects have the concept of a workspace. Understanding these dependency management mechanisms and project structures is crucial for correctly locating code and running tests.
      • Difficulty in Parsing Error Messages: Error message formats produced by different languages and toolchains vary widely; compile errors, link errors, and runtime errors also manifest differently. We need to train the model to understand these diverse error messages and extract useful debugging clues from them.

      Ultimately, we built a multi-language training system covering over ten languages including JS, TS, HTML, CSS, Python, Java, Go, C++, Kotlin, C, and Rust. We obtained over 100,000 environments usable for training and evaluation from real GitHub repositories, with each environment containing complete Issues, code, and test cases.

      To support such massive Environment Scaling and RL training, we built a high-concurrency sandbox infrastructure capable of launching over 5,000 isolated execution environments within 10 seconds, while supporting the concurrent operation of tens of thousands of environments.

      This infrastructure allows us to efficiently conduct large-scale multi-language coding agent training.

    • Beyond Bug Fix: Multi-Task Capabilities

      Real software development is far more than just fixing bugs. A programmer's daily routine includes writing tests, code reviews, performance optimization, and other tasks. In the training of MiniMax-M2.1, we also conducted targeted optimization for these scenarios, including acquiring high-quality problems and designing corresponding Reward signals:

      • Test Generation Capability: Early in the R&D of M1, we discovered that the ability to write tests was a major bottleneck restricting the accuracy of code generated by language models. In the agentless framework, the model generates multiple fix solutions in parallel and then uses its own generated test code to select the final solution. However, due to unreasonable reward design in the RL process for M1, it consistently wrote overly simple test code, causing a large number of incorrect fix solutions to be selected. Generating high-quality test cases requires the model to deeply understand code logic, boundary conditions, and potential failure scenarios. MiniMax-M2.1 synthesized a large volume of training samples to enhance testing ability based on GitHub PRs and self-generated Code Patches, eventually tying with Claude Sonnet 4.5 on SWT-bench, which evaluates testing capabilities.
      • Code Performance Optimization: Besides implementation correctness, execution efficiency is also critical in actual development. The model needs to understand low-level knowledge like algorithm complexity, memory usage, and concurrency handling, while also mastering best practices for specific APIs in software development.
        During training, MiniMax-M2.1 was encouraged to write more efficient code, subsequently achieving significant progress on SWE-Perf, with an average performance boost of 3.1%.
        In the future, we will apply corresponding optimization methods to other performance-sensitive scenarios like Kernel optimization and database query optimization.
      • Code Review Capability: Based on the SWE framework, we built an internal Benchmark called SWE-Review, covering multiple languages and scenarios to evaluate the recall rate and hallucination rate of code defects.
        A review is judged as correct only if it accurately identifies the target defect without producing any false positives, imposing high requirements on the model's precision.
    • Generalization on OOD Scaffolds

      Generalization on OOD scaffolds is vital for a coding agent. Developers use different scaffolds — some use Claude Code, some use Cursor, and others use proprietary agent frameworks. If a model is optimized only for a specific scaffold, its performance will be severely discounted in other environments, strictly limiting its capability in real development scenarios. In MiniMax-M2.1, we believe scaffold generalization primarily tests the model's long-range instruction following ability and adaptability to context management strategies:

      • Long-Range Instruction Following: Complex development scenarios require the model to integrate and execute "composite instruction constraints" from multiple sources, including System Prompt, User Query, Memory, Tool Schema, and various specification files (such as
        Agents.md,
        Claude.md,
        Skill.md, etc.). Developers strictly constrain the model's expected behavior by designing these specifications. Once the agent fails to meet a requirement at any step during inference, it may lead to a severe degradation in end-to-end results.
      • Adaptability to Context Management: During the early release of M2, the community did not fully understand the design of Interleaved Thinking. When used in many scaffolds, the results were inconsistent with the model's inherent capabilities. At that time, we found that some popular scaffold designs would discard some historical thinking content in multi-turn conversations; this design caused M2's performance to drop by varying degrees across different evaluation sets. In MiniMax-M2.1, on one hand, we still recommend developers use the Interleaved Thinking feature to unleash the full potential of M2.1; on the other hand, we designed corresponding training methods to ensure the model's "IQ" remains online even when users employ all sorts of imaginative context management strategies.

      To verify MiniMax-M2.1's scaffold generalization, we directly tested SWE-Bench performance on different scaffolds and also constructed a test set closer to real-world usage to observe if the model meets various scaffold instruction constraints. Ultimately, we found that MiniMax-M2.1 maintained an SWE-Bench score above 67 in
      mini-swe-agent,
      Droid, and
      Claude Code.

      Compared to M2, MiniMax-M2.1 shows significant improvement across different OOD scaffolds. On OctoCodingbench, M2.1 improved from M2's 13.3 to 26.1, demonstrating strong compliance with scaffold instruction constraints.

    2026 TODOs

    We believe the development of coding agents still has a long way to go. Therefore, this year we will explore several interesting directions:

    • Defining the Reward Signal for Developer Experience: Beyond the optimization directions mentioned above, we hope to further quantify and optimize developer experience. Current evaluation standards mainly focus on whether the task is ultimately completed, ignoring the user experience during the process. We plan to explore richer Reward dimensions: regarding code quality, including readability, modularity, and comment completeness; regarding interaction experience, including response latency, information transparency, and interpretability of intermediate states; regarding engineering standards, including commit message quality, PR description completeness, and code style consistency. Although these metrics are difficult to evaluate fully automatically, we are exploring hybrid solutions combining static analysis tools, Agent-as-a-Verifier, and human preference learning, hoping to make the coding agent not only complete tasks but also deliver high-quality code like an excellent human engineer.
    • Improving Problem-Solving Efficiency: MiniMax-M2.1 still has some issues with over-exploration, such as repeatedly reading the same file or executing redundant tests. We plan to optimize efficiency from multiple angles: reducing trial-and-error through better planning capabilities; reducing unnecessary file reads through more precise code localization; avoiding repetitive exploration through better memory mechanisms; and responding quickly to simple tasks through adaptive thinking depth.
    • RL Scaling: The Scaling Law of reinforcement learning still holds huge potential for coding agents. We have verified the positive correlation between environment count, training steps, and model capability, but we are far from reaching convergence. We plan to continue exploring in three dimensions: Compute dimension, increasing concurrent environment count and training iterations; Data dimension, building a larger-scale and more diverse training task pool; Algorithm dimension, exploring more efficient exploration strategies, more stable training objectives, and better reward shaping methods. Simultaneously, we are researching how to make the RL training process itself more efficient, including better curriculum learning designs, smarter sample reuse strategies, and cross-task knowledge transfer.
    • Coding World Model & User Simulator: As mentioned earlier, the training of this generation of coding agents (M2.1) relies heavily on execution in real environments, which brings massive computational overhead and environment construction costs. We are exploring building a World Model capable of predicting code execution results: given a piece of code and environment state, the model can predict whether tests pass, what error messages will be produced, and how the program will behave. This will enable us to perform large-scale rollout and policy optimization without actually executing code. Meanwhile, we are also building a user behavior simulator to model the patterns of interaction between real developers and the agent—including vague requirement descriptions, mid-stream requirement changes, and feedback on intermediate results—allowing the model to adapt to various user behavior patterns in real scenarios during the training phase.
    • Extremely Efficient Data Pipeline: Building a data pipeline capable of automatically discovering, filtering, and generating harder, longer-range tasks to continuously raise the model's ceiling. High-quality training data is a key bottleneck for coding agent progress. We are building an automated data flywheel: automatically discovering high-quality Issues and PRs from GitHub; using models to assess task difficulty and perform stratification; automatically augmenting tasks that the current model can easily solve to make them more challenging; and analyzing failure causes for failed cases to generate targeted training data. The ideal state here is to build an "inexhaustible" source of high-quality tasks, keeping training data difficulty slightly above the model's current capability to maintain optimal learning efficiency. We are also exploring how to automatically generate ultra-long-range tasks that require hours or even days to complete, pushing the model's capability boundaries in complex project understanding and long-term planning.
    • More Scenario Coverage: Expanding to more specialized fields such as GPU Kernel development, compiler development, smart contracts, and machine learning. Each field has its unique knowledge system, toolchain, and best practices, while possessing real application scenarios and commercial value. We plan to gradually build training environments and evaluation systems for these professional fields, enabling the coding agent to handle more specialized and high-value development tasks. Looking further ahead, we believe the paradigm of "Define Problem - Define Reward - Environment Construction - Model Training" demonstrated in coding agent training can be transferred to more scenarios requiring complex reasoning and execution feedback.
    Original source Report a problem
  • Dec 17, 2025
    • Parsed from source:
      Dec 17, 2025
    • Detected by Releasebot:
      Dec 18, 2025
    Minimaxi logo

    Minimaxi

    MiniMax x Retell AI: Your smarter TTS for real-time conversations

    MiniMax Speech now integrates with Retell AI, delivering ultra-human real-time TTS across 40+ languages and 20+ voices inside Retell AI. Expect sub 250 ms latency, smart text normalization, and seamless use for videos, podcasts, and interactive agents.

    MiniMax Speech: Ultra-Fast. Ultra-Human. Ultra-Smart.

    We’re excited to announce that MiniMax Speech is now integrated with Retell AI, bringing state-of-the-art text-to-speech directly to creators and developers.
    With this integration, you can generate ultra-human, ultra-fast, and ultra-smart speech across more languages and voices—seamlessly within Retell AI’s all-in-one platform.

    MiniMax Speech helps power videos, presentations, podcasts, and real-time interactive agents with professional-grade audio. Built for both real-time and production use, it delivers speed and realism at scale.

    Ultra-Fast Performance

    • < 250 ms latency, enabling real-time conversational and interactive use cases.

    Smart Text Normalization

    • Automatically handles URLs, emails, dates, numbers, and other structured text for natural pronunciation.

    Multilingual Support

    • Supports 40+ languages with seamless inline code switching within a single utterance.

    More choices, More Authentic Voices

    • Access 20+ high-quality voices across different languages, genders, and accents, with continuous updates over time.

    How to Use MiniMax Speech in Retell AI

    1. Open Your Retell AI Project
      Log in and create a new project or open an existing one.

    2. Select MiniMax as Your Voice Engine
      Go to Global Settings → Voice & Language, then choose MiniMax.
      Pick from 20+ authentic voices across different languages, accents, and styles.

    3. Import or Write your prompts
      Type directly or import prompts via the Global Prompt dialog.

    4. Generate or Preview Audio
      Save your prompt and test your agent—MiniMax delivers lifelike speech in under 250 ms.

    Step into the Future of Audio Creation

    No more juggling tools or fragmented workflows. With MiniMax Speech fully integrated into Retell AI, everything you need to turn text into studio-quality speech is now in one place.
    Powered by MiniMax and Retell AI.

    Original source Report a problem
  • Nov 14, 2025
    • Parsed from source:
      Nov 14, 2025
    • Detected by Releasebot:
      Dec 12, 2025
    Minimaxi logo

    Minimaxi

    Hailuo AI x VideoTube: Smarter Video Creation for Everyone

    MiniMax and VideoTube unveil Hailuo-2.3 powered AI video creation, enabling fast 1080p text- or image-to-video with cinematic styles. A streamlined prompt-to-post workflow lets creators generate short videos in under a minute.

    We’re excited to announce that MiniMax (Hailuo AI) is partnering with VideoTube to make AI video creation faster and easier than ever. With Hailuo’s latest model, Hailuo-2.3, now built into AI Video Generator, you can turn your ideas into high-quality videos — no editing skills required.

    What Is VideoTube?

    VideoTube is an all-in-one AI video platform that helps you turn one simple idea into complete, ready-to-post shorts — for free.
    You can add images, audio, and story elements in just a few clicks, and create professional YouTube content that helps your channel grow faster.

    What's New with Hailuo-2.3

    This partnership brings the newest Hailuo-2.3 model to VideoTube, offering:

    • Text-to-video and image-to-video generation — just write a prompt or upload an image.
    • Professional-level videos (up to 1080p) that render in under a minute.
    • Realistic visuals — natural human motion, cinematic effects, and unique styles like Pixar or surreal waterlight.
    • Smooth workflow — upload/type, choose style, generate, edit, and share directly within VideoTube.

    Who Can Benefit

    • Creators and influencers making short-form videos.
    • Marketers needing quick, on-brand video content.
    • Teachers and trainers who want visual, easy-to-understand lessons.
    • Small teams or studios looking for big results without big budgets.

    How to Use Hailuo-2.3 in VideoTube

    • Visit videotube.ai/image-to-video
    • Upload an image or type a short text prompt
    • Choose your preferred resolution, clip length (e.g., 6 s or 10 s), and style
    • Click “Generate” — thanks to MiniMax's optimized engine, typical image-to-video clips render in under a minute
    • Use VideoTube's editing tools to add subtitles, logos, and transitions
    Original source Report a problem
  • Nov 3, 2025
    • Parsed from source:
      Nov 3, 2025
    • Detected by Releasebot:
      Dec 12, 2025
    Minimaxi logo

    Minimaxi

    Interleaved Thinking Unlocks Reliable MiniMax-M2 Agentic Capability

    MiniMax-M2 gains stronger interleaved thinking support across OpenAI and Anthropic APIs, preserving prior reasoning to boost reliability and planning. The update includes separate reasoning_details in the OpenAI-compatible API and guidance for Anthropic API, plus ecosystem partnerships.

    Since MiniMax-M2's launch last week, we have seen a surge in community adoption and usage. Yesterday M2 became one of the top 3 models in usage on OpenRouter. However, we have also observed incorrrect implementations of M2, especially regarding interleaved thinking, which significantly reduce the model's performance.

    During the very early stage of developing M2, we discovered that interleaved thinking is important in both agentic and coding applications. Since most current models, apart from Anthropic Claude, do not fully support interleaved thinking, we believe it hasn't yet become a universal convention. From users' feedback, we've also noticed that interleaved thinking is sometimes not applied correctly in practice. To address this, we'd like to share our understanding on how to use it effectively across different API interfaces to achieve better results.

    Why is Interleaved Thinking Important for M2?

    Interleaved thinking is essential for LLM agents: it means alternating between explicit reasoning and tool use, while carrying that reasoning forward between steps. This process significantly enhances planning, self‑correction, and reliability in long workflows. (See Anthropic’s guidance on interleaved thinking for more background). In practice, it transforms long, tool‑heavy tasks into a stable plan act reflect loop, reducing state drift and repeated mistakes while keeping actions grounded in fresh evidence. Interleaved thinking also improves debuggability: reasoning snapshots make failures explainable and recoverable, and raise sample‑efficiency by reusing hypotheses, constraints, and partial conclusions instead of re‑deriving them each step. For best results, interleave thinking with tool feedback rather than front‑loading it, and persist the chain of thought so it compounds across turns.

    From community feedback, we've often observed failures to preserve prior-round thinking state across multi-turn interactions with M2. The root cause is that the widely-used OpenAI Chat Completion API does not support passing reasoning content back in subsequent requests. Although the Anthropic API natively supports this capability, the community has provided less support for models beyond Claude, and many applications still omit passing back the previous turns' thinking in their Anthropic API implementations. This situation has resulted in poor support for Interleaved Thinking for new models. To fully unlock M2's capabilities, preserving the reasoning process across multi-turn interactions is essential.

    In MiniMax-M2, interleaved CoT works most effectively when prior‑round reasoning is preserved and fed back across turns. The model reasons between tool calls and carries forward plans, hypotheses, constraints, and intermediate conclusions — this accumulated state is the backbone of reliability. When prior state is dropped, cumulative understanding breaks down, state drift increases, self‑correction weakens, and planning degrades — especially on long‑horizon toolchains and run‑and‑fix loops.

    Retaining prior‑round thinking state improves performance significantly compared to discarding it, as evident across benchmarks: SWE‑Bench Verified 69.4 vs. 67.2 (9;=+2.2; +3.3%), Tau^2 87 vs. 64 (9;=+23; +35.9%), BrowseComp 44.0 vs. 31.4 (9;=+12.6; +40.1%), GAIA 75.7 vs. 67.9 (9;=+7.8; +11.5%), and xBench 72.0 vs. 66.0 (9;=+6.0; +9.1%).

    Keep the interleaved thinking state intact is important. Reliability isn’t just about what LLM think now; it’s about whether LLM can revisit and revise what it thought before. Interleaved thinking operationalizes this: plan act reflect, with state preserved so reflection compounds and corrections propagate across turns.

    Interleaved Thinking Implemented Correctly

    Enabling Interleaved Thinking in MiniMax-M2

    We provide best-in-class interleaved thinking support for MiniMax-M2 on our open API platform: https://platform.minimax.io. For best performance and compatibility, we strongly recommd using our official API. In general, MiniMax offers two API interfaces:

    OpenAI-Compatible API

    Now, when calling the M2 model through the MiniMax OpenAI-Compatible API, you can experience:

    • A separate reasoning_details field: The model's reasoning process is returned in a separate reasoning_details field, no longer mixed with the content. This makes the API structure cleaner and easier to parse.
    • A complete chain of thought: Passing the reasoning_details field in subsequent requests ensures that the model maintains a complete chain of thought across multiple tool calls, leading to more accurate judgments and planning.

    Code examples are available in the official guide.

    Anthropic-Compatible API

    The Anthropic API natively supports Interleaved Thinking. Simply append the model's complete output from each round (including thinking_blocks) to the messages history and send it to the API in subsequent requests.

    For more details, please refer to the official guide.

    Advancing Industry Standards for the Future of Agents

    In addition to our official API platform support of interleaved thinking, we are helping partners such as OpenRouter, Ollama, Droid, Vercel, Cline to test and implement interleaved thinking correctly. Through helping our ecosystem partners, we aim to establish a unified protocol paradigm for widely supporting Interleaved Thinking among applications, OpenAI-Compatible APIs, and Anthropic-Compatible APIs — setting a foundation for the industry to build on. We believe that an open and unified standard will empower developers worldwide to easily build more capable, reliable AI agents, and foster a thriving AI ecosystem.

    For partnership and collaboration, please do not hesitate to contact us at [email protected].

    Links

    1. Anthropic's guidance on interleaved thinking: https://docs.claude.com/en/docs/build-with-claude/extended-thinking#interleaved-thinking
    2. OpenAI-Compatible API: https://platform.minimax.io/docs/guides/text-m2-function-call#openai-sdk
    3. Anthropic-Compatible API: https://platform.minimax.io/docs/guides/text-m2-function-call#anthropic-sdk
    4. MiniMax Official Open Platform: http://platform.minimax.io

    Intelligence with Everyone!

    Original source Report a problem
  • Oct 29, 2025
    • Parsed from source:
      Oct 29, 2025
    • Detected by Releasebot:
      Dec 12, 2025
    Minimaxi logo

    Minimaxi

    What makes good Reasoning Data

    MiniMax M2 debuts as a top open‑source model, showcasing advanced reasoning data, diverse CoT formats, and scalable data pipelines. It also shares insights on data quality, distribution, and plans for tool‑augmented reasoning and future work.

    Artificial Analysis is a comprehensive benchmark that reflects the diversity of models’ reasoning abilities. Our newly released model, MiniMax M2, ranks Top-1 among open-source models and Top-5 among all models.

    In the past, community discussions on improving reasoning abilities often focused on optimizing RL algorithms or constructing verifiable data in domains like Math and Code. In the M2 project, we conducted more "general" explorations. As a member of the Reasoning team, I'd like to share some of our findings and thoughts on data — what makes good reasoning data.

    Quality of CoT and Response

    The quality of CoT is reflected in its logical completeness without excessive redundancy. For instance, in instruction following tasks, overly brief CoT often leads to models skipping steps or being overconfident, causing significant harm to the model's final performance and capability generalization. For responses, we noticed that most open-source work overfits certain benchmark format patterns to achieve better leaderboard scores. While this is effective for single data directions, it severely hinders capability generalization for a general-purpose model. Therefore, when synthesizing data, we introduced format diversity and observed significant gains in multi-directional fusion experiments. Meanwhile, for potential bad cases in CoT and responses, such as hallucinations, instruction-following failures, and logical errors. We performed data cleaning using rules + LLM-as-a-judge. By continuously iterating on this misalignment elimination pipeline, we've become increasingly convinced that every bad case has its corresponding dirty training data, and improvements in data quality will inevitably be reflected in model performance.

    Difficulty and Diversity of Data Distribution

    Like many discussions in the community, our experiments also found that math and code data are critical for improving reasoning capabilities. The reasoning abilities brought by these two types of data often benefit all tasks, such as STEM and IF. However, we also found that we still need sufficiently diverse data to cover more domains, such as logical reasoning, science, instruction following, and open-ended creative tasks. Tasks from different domains have different thinking paradigms, and the diversity of reasoning is the foundation for capability generalization. Additionally, we noticed in our experiments that harder and more complex queries are more effective for model training, so we adjusted data distribution based on pass rate (for verifiable tasks) or complexity scores (for non-verifiable tasks).

    Data Scaling

    Finally, an old but important topic:
    Scaling. When data quality and diversity meet the standards, increasing data scale consistently brings significant gains. Whether it's increasing the number of queries, doing 1Q-multiple-A, multi-epoch training, or even mixing data from different directions to bring more training steps, the model steadily improves. In practice, data scaling is a highly engineering-oriented problem, so we attempted to consolidate all data based on task characteristics, dividing them into two data pipelines: Verifiable and Non-Verifiable, for automated data synthesis and processing. In fact, the Reasoning team is almost entirely composed of interns, and this data pipeline effectively ensured team collaboration efficiency and consistency in data output.

    Future Work

    Moving forward, we will continue to delve deeper in two directions. One is compound capabilities, such as knowledge + reasoning, and the enhancement of reasoning tasks by tools in Agent scenarios. The other is how to integrate Verifiable and Non-Verifiable tasks, such as the fusion of CoT across different domains and the generalization of reasoning capabilities, as well as the unification of training methods. Our team is continuously progressing and growing. We welcome interested colleagues to join the discussion. Happy to chat!

    Intelligence with Everyone!

    Original source Report a problem
  • Oct 28, 2025
    • Parsed from source:
      Oct 28, 2025
    • Detected by Releasebot:
      Dec 12, 2025
    Minimaxi logo

    Minimaxi

    MiniMax and VEED: Introducing Hailuo-2.3 to Bring AI Video to Production Level

    VEED is a day-one launch partner for Hailuo-2.3, bringing AI video generation directly into VEED’s AI Playground for prompt-to-production workflows. Two models, MiniMax-Hailuo-2.3 and MiniMax-Hailuo-2.3-Fast, deliver fast, high quality clips with strong visuals and VFX.

    We are excited to announce

    We are excited to announce that VEED has become a day-one launch partner for Hailuo-2.3, our latest breakthrough AI video generation model. Starting today, creators can access Hailuo-2.3 directly inside VEED's AI Playground, combining professional-grade AI generation with VEED's intuitive online editing experience.

    VEED is an online video editing platform that allows users to easily create, edit, and share videos directly from their browser. It offers features like adding subtitles, templates, screen recording, AI-powered editing tools, and supports team collaboration, making it popular among content creators, marketers, and businesses for quick and professional video production.

    Now, with the integration of Hailuo-2.3, creators, marketers, and businesses can go from prompt to production ready video in one seamless workflow.

    Models Available in VEED

    Two models are now available for immediate use inside VEED:

    • MiniMax-Hailuo-2.3

      • Supports both text and image model inputs. Generates videos in 768p or 1080p, with 6 or 10 second durations. Designed for maximum visual quality and creative control.
    • MiniMax-Hailuo-2.3-Fast

      • Only supports image input but gets faster. Produces 6 second clips at 768p in around 55 seconds, delivering one of the fastest AI video generation speeds in the industry.

    What Sets MiniMax-Hailuo-2.3 Apart

    • Exceptional human physics, enabling dynamic and fluid movements such as flips, dancing sequences (including belly dancing and waltz), and more.
    • Powerful VFX capabilities, delivering cinematic realism and immersive visual effects.
    • Seamless style transformation, allowing for versatile and creative aesthetic shifts.
    • Advanced stylization options, offering transformations like Pixar-style visuals and surrealist effects (e.g., waterlight).

    How to Use MiniMax-Hailuo-2.3 in VEED

    • Step 1: Navigate to VEED's AI playground.
    • Step 2: Select MiniMax-Hailuo-2.3 from the available models.
    • Step 3: Input your detailed text prompt or upload an image to animate. The more specific your prompt, the better your results.
    • Step 4: Choose your video duration, aspect ratio, and style preferences.
    • Step 5: Click Generate and wait as MiniMax-Hailuo-2.3 creates your video—typically in under a minute for image-to-video generation.
    • Step 6: Once generated, use VEED's full editing suite to add finishing touches, brand elements, or combine with other footage.

    Your project automatically saves to your workspace, so you can refine it anytime

    A New Standard for AI Video

    Hailuo-2.3 marks a turning point. AI video is no longer experimental, it is now viable for professional and commercial workflows. With VEED as a day one launch partner, this technology is available to creators everywhere.

    Original source Report a problem

This is the end. You've seen all the release notes in this feed!

Related vendors