- Jan 4, 2026
- Parsed from source:Jan 4, 2026
- Detected by Releasebot:Jan 4, 2026
Unban subscribers
What’s New
You can now unban subscribers directly from your subscribers page. If you've previously banned a subscriber and want to give them another chance, just find them in your list (filter by "Removed" status), click the overflow menu, and hit "Unban."
This also works as a bulk action—select multiple banned subscribers and unban them all at once. Unbanned subscribers are restored to their previous subscription status.
Original source Report a problem - Jan 4, 2026
- Parsed from source:Jan 4, 2026
- Detected by Releasebot:Jan 4, 2026
Zod by colinhacks
v4.3.5
Commits
- 21afffd [Docs] Update migration guide docs for deprecation of message (#5595)
- e36743e Improve mini treeshaking
- 0cdc0b8 4.3.5
- Jan 4, 2026
- Parsed from source:Jan 4, 2026
- Detected by Releasebot:Jan 4, 2026
v10.45.4
Fixes broken package.json file (#7078)
Full Changelog: v10.45.3...v10.45.4
Original source Report a problem - Jan 4, 2026
- Parsed from source:Jan 4, 2026
- Detected by Releasebot:Jan 4, 2026
January 4, 2026
Site Speed Tips
The Site Speed Tips panel now automatically identifies unoptimized images on your site and enables a one-click WebP compression, for improved page rendering.
Original source Report a problem - Jan 4, 2026
- Parsed from source:Jan 4, 2026
- Detected by Releasebot:Jan 4, 2026
M2.1: Multilingual and Multi-Task Coding with Strong Generalization
MiniMax-M2.1 delivers a multi‑language, multi‑task coding agent with strong scaffold generalization and top‑tier benchmarks, signaling a practical leap toward enterprise coding, testing, and collaboration. The release outlines scalable RL training, broader problem coverage, and a bold roadmap for future efficiency and scope.
The Gap Between SWE-Bench and Real-World Coding
In 2025, SWE-Bench has become the most authoritative evaluation standard for code generation scenarios. In this evaluation, LLMs must face bugs from real GitHub repositories and fix them through multiple rounds of code reading and testing. The core value of SWE-Bench lies in the fact that the tasks it evaluates are highly close to a programmer's daily work, and the results can be objectively verified via test cases — a feature particularly crucial for reinforcement learning training. We can directly use the test pass rate as a reward signal, continuously optimizing the model in a real code environment without relying on the noise introduced by human labeling or model evaluation.
However, like all evaluation standards, SWE-Bench is not perfect. For a coding agent to be usable in real-world scenarios, there are more capability dimensions beyond SWE-Bench that need attention:
- Limited Language Coverage: SWE-Bench currently only covers Python. In real development scenarios, developers need to handle multiple languages such as Java, Go, TypeScript, Rust, and C++, often collaborating across multiple languages within the same project.
- Restricted Task Types: SWE-Bench only involves bug-fixing tasks. Other real-world capabilities, such as implementing new features, generating test cases, project refactoring, code review, performance optimization, and CI/CD configuration can't be evaluated.
- Scaffold Binding: SWE-Bench usually only evaluates the model's performance on a specific scaffold, so the model's generalization on other scaffolds cannot be accurately observed. Meanwhile, different agent scaffolds design various context management strategies, and the model needs to be able to adapt to these differences.
How to Fill These Gaps
Environment Scaling
We often see developers complaining that current coding agents perform well on languages like Python/JavaScript but show lackluster results in more serious enterprise-level development scenarios. If the task involves complex project understanding, the performance degrades further.
To solve this problem, during the training cycle of MiniMax-M2.1, we built a comprehensive data pipeline covering Top 10+ mainstream programming languages. We retrieved a massive number of Issues, PRs, and corresponding test cases from GitHub, and conducted strict filtering, cleaning, and rewriting based on this raw data to ensure the quality of Post Training data. A coding agent is naturally suited for mass-producing this kind of training environment. During this process, we found that for both the M2 model and other frontier models, the success rate of constructing multi-language environments was lower than that of Python. There are several distinct situations here:
- Environmental Complexity of Compiled Languages: Python, as an interpreted language, has relatively simple configuration. However, for compiled languages like Java, Go, Rust, and C++, we need to handle complex compilation toolchains, version compatibility, and cross-compilation issues. A Java project might depend on a specific version of JDK, Maven/Gradle, and numerous third-party libraries; an error in any link can lead to build failure.
- Diverse Test Frameworks: In the Python ecosystem, pytest dominates, but test frameworks in other languages are more fragmented. Java has JUnit and TestNG; JavaScript has Jest, Mocha, and Vitest; Go has the built-in testing package but also extensions like testify; Rust has built-in tests and criterion, etc. We need to design specialized test execution and result parsing logic for each framework.
- Dependency Management & Project Structure: Package managers for different languages differ vastly in dependency resolution, version locking, and private repository support. The nested structure of npm's node_modules, Maven's central repository mechanism, and Cargo's semantic versioning all require targeted handling. Simultaneously, project structure standards vary: Python structures are flexible, but Java projects usually follow strict Maven/Gradle directory standards; Go projects have GOPATH and Go Modules modes; Rust projects have the concept of a workspace. Understanding these dependency management mechanisms and project structures is crucial for correctly locating code and running tests.
- Difficulty in Parsing Error Messages: Error message formats produced by different languages and toolchains vary widely; compile errors, link errors, and runtime errors also manifest differently. We need to train the model to understand these diverse error messages and extract useful debugging clues from them.
Ultimately, we built a multi-language training system covering over ten languages including JS, TS, HTML, CSS, Python, Java, Go, C++, Kotlin, C, and Rust. We obtained over 100,000 environments usable for training and evaluation from real GitHub repositories, with each environment containing complete Issues, code, and test cases.
To support such massive Environment Scaling and RL training, we built a high-concurrency sandbox infrastructure capable of launching over 5,000 isolated execution environments within 10 seconds, while supporting the concurrent operation of tens of thousands of environments.
This infrastructure allows us to efficiently conduct large-scale multi-language coding agent training.
Beyond Bug Fix: Multi-Task Capabilities
Real software development is far more than just fixing bugs. A programmer's daily routine includes writing tests, code reviews, performance optimization, and other tasks. In the training of MiniMax-M2.1, we also conducted targeted optimization for these scenarios, including acquiring high-quality problems and designing corresponding Reward signals:
- Test Generation Capability: Early in the R&D of M1, we discovered that the ability to write tests was a major bottleneck restricting the accuracy of code generated by language models. In the agentless framework, the model generates multiple fix solutions in parallel and then uses its own generated test code to select the final solution. However, due to unreasonable reward design in the RL process for M1, it consistently wrote overly simple test code, causing a large number of incorrect fix solutions to be selected. Generating high-quality test cases requires the model to deeply understand code logic, boundary conditions, and potential failure scenarios. MiniMax-M2.1 synthesized a large volume of training samples to enhance testing ability based on GitHub PRs and self-generated Code Patches, eventually tying with Claude Sonnet 4.5 on SWT-bench, which evaluates testing capabilities.
- Code Performance Optimization: Besides implementation correctness, execution efficiency is also critical in actual development. The model needs to understand low-level knowledge like algorithm complexity, memory usage, and concurrency handling, while also mastering best practices for specific APIs in software development.
During training, MiniMax-M2.1 was encouraged to write more efficient code, subsequently achieving significant progress on SWE-Perf, with an average performance boost of 3.1%.
In the future, we will apply corresponding optimization methods to other performance-sensitive scenarios like Kernel optimization and database query optimization. - Code Review Capability: Based on the SWE framework, we built an internal Benchmark called SWE-Review, covering multiple languages and scenarios to evaluate the recall rate and hallucination rate of code defects.
A review is judged as correct only if it accurately identifies the target defect without producing any false positives, imposing high requirements on the model's precision.
Generalization on OOD Scaffolds
Generalization on OOD scaffolds is vital for a coding agent. Developers use different scaffolds — some use Claude Code, some use Cursor, and others use proprietary agent frameworks. If a model is optimized only for a specific scaffold, its performance will be severely discounted in other environments, strictly limiting its capability in real development scenarios. In MiniMax-M2.1, we believe scaffold generalization primarily tests the model's long-range instruction following ability and adaptability to context management strategies:
- Long-Range Instruction Following: Complex development scenarios require the model to integrate and execute "composite instruction constraints" from multiple sources, including System Prompt, User Query, Memory, Tool Schema, and various specification files (such as
Agents.md,
Claude.md,
Skill.md, etc.). Developers strictly constrain the model's expected behavior by designing these specifications. Once the agent fails to meet a requirement at any step during inference, it may lead to a severe degradation in end-to-end results. - Adaptability to Context Management: During the early release of M2, the community did not fully understand the design of Interleaved Thinking. When used in many scaffolds, the results were inconsistent with the model's inherent capabilities. At that time, we found that some popular scaffold designs would discard some historical thinking content in multi-turn conversations; this design caused M2's performance to drop by varying degrees across different evaluation sets. In MiniMax-M2.1, on one hand, we still recommend developers use the Interleaved Thinking feature to unleash the full potential of M2.1; on the other hand, we designed corresponding training methods to ensure the model's "IQ" remains online even when users employ all sorts of imaginative context management strategies.
To verify MiniMax-M2.1's scaffold generalization, we directly tested SWE-Bench performance on different scaffolds and also constructed a test set closer to real-world usage to observe if the model meets various scaffold instruction constraints. Ultimately, we found that MiniMax-M2.1 maintained an SWE-Bench score above 67 in
mini-swe-agent,
Droid, and
Claude Code.Compared to M2, MiniMax-M2.1 shows significant improvement across different OOD scaffolds. On OctoCodingbench, M2.1 improved from M2's 13.3 to 26.1, demonstrating strong compliance with scaffold instruction constraints.
- Long-Range Instruction Following: Complex development scenarios require the model to integrate and execute "composite instruction constraints" from multiple sources, including System Prompt, User Query, Memory, Tool Schema, and various specification files (such as
2026 TODOs
We believe the development of coding agents still has a long way to go. Therefore, this year we will explore several interesting directions:
- Defining the Reward Signal for Developer Experience: Beyond the optimization directions mentioned above, we hope to further quantify and optimize developer experience. Current evaluation standards mainly focus on whether the task is ultimately completed, ignoring the user experience during the process. We plan to explore richer Reward dimensions: regarding code quality, including readability, modularity, and comment completeness; regarding interaction experience, including response latency, information transparency, and interpretability of intermediate states; regarding engineering standards, including commit message quality, PR description completeness, and code style consistency. Although these metrics are difficult to evaluate fully automatically, we are exploring hybrid solutions combining static analysis tools, Agent-as-a-Verifier, and human preference learning, hoping to make the coding agent not only complete tasks but also deliver high-quality code like an excellent human engineer.
- Improving Problem-Solving Efficiency: MiniMax-M2.1 still has some issues with over-exploration, such as repeatedly reading the same file or executing redundant tests. We plan to optimize efficiency from multiple angles: reducing trial-and-error through better planning capabilities; reducing unnecessary file reads through more precise code localization; avoiding repetitive exploration through better memory mechanisms; and responding quickly to simple tasks through adaptive thinking depth.
- RL Scaling: The Scaling Law of reinforcement learning still holds huge potential for coding agents. We have verified the positive correlation between environment count, training steps, and model capability, but we are far from reaching convergence. We plan to continue exploring in three dimensions: Compute dimension, increasing concurrent environment count and training iterations; Data dimension, building a larger-scale and more diverse training task pool; Algorithm dimension, exploring more efficient exploration strategies, more stable training objectives, and better reward shaping methods. Simultaneously, we are researching how to make the RL training process itself more efficient, including better curriculum learning designs, smarter sample reuse strategies, and cross-task knowledge transfer.
- Coding World Model & User Simulator: As mentioned earlier, the training of this generation of coding agents (M2.1) relies heavily on execution in real environments, which brings massive computational overhead and environment construction costs. We are exploring building a World Model capable of predicting code execution results: given a piece of code and environment state, the model can predict whether tests pass, what error messages will be produced, and how the program will behave. This will enable us to perform large-scale rollout and policy optimization without actually executing code. Meanwhile, we are also building a user behavior simulator to model the patterns of interaction between real developers and the agent—including vague requirement descriptions, mid-stream requirement changes, and feedback on intermediate results—allowing the model to adapt to various user behavior patterns in real scenarios during the training phase.
- Extremely Efficient Data Pipeline: Building a data pipeline capable of automatically discovering, filtering, and generating harder, longer-range tasks to continuously raise the model's ceiling. High-quality training data is a key bottleneck for coding agent progress. We are building an automated data flywheel: automatically discovering high-quality Issues and PRs from GitHub; using models to assess task difficulty and perform stratification; automatically augmenting tasks that the current model can easily solve to make them more challenging; and analyzing failure causes for failed cases to generate targeted training data. The ideal state here is to build an "inexhaustible" source of high-quality tasks, keeping training data difficulty slightly above the model's current capability to maintain optimal learning efficiency. We are also exploring how to automatically generate ultra-long-range tasks that require hours or even days to complete, pushing the model's capability boundaries in complex project understanding and long-term planning.
- More Scenario Coverage: Expanding to more specialized fields such as GPU Kernel development, compiler development, smart contracts, and machine learning. Each field has its unique knowledge system, toolchain, and best practices, while possessing real application scenarios and commercial value. We plan to gradually build training environments and evaluation systems for these professional fields, enabling the coding agent to handle more specialized and high-value development tasks. Looking further ahead, we believe the paradigm of "Define Problem - Define Reward - Environment Construction - Model Training" demonstrated in coding agent training can be transferred to more scenarios requiring complex reasoning and execution feedback.
- Jan 3, 2026
- Parsed from source:Jan 3, 2026
- Detected by Releasebot:Jan 3, 2026
AI Summaries, New Design and More
Telegram opens 2026 with AI powered summaries for channel posts and Instant View, plus a Liquid Glass redesign on iOS. All AI runs via Cocoon for privacy and there’s a new Power Saving mode to extend battery life.
Telegram's first update of 2026 brings even more Liquid Glass interfaces on iOS and AI summaries for channel posts and Instant View pages — built to maximize privacy and protect user data.
AI Summaries
Long posts in channels can now be instantly summarized — to recap the latest news and stay productive. Instant View pages get an automatic AI summary at the top.
The new AI summaries are powered by open-source models running on Cocoon — a decentralized network designed to maximize privacy. In Cocoon, each request is securely encrypted to protect user data.
Cocoon can be integrated into any AI application — learn more in the detailed developer documentation.
New Design
Telegram for iOS now fully supports Liquid Glass — with transparent elements and beautiful refraction effects throughout the entire app.
You can control interface effects to maximize performance and extend battery life in Settings > Power Saving.
Happy New Year!
In 2025, Telegram launched over 75 new features across 13 major updates (this would have been the 14th if Apple reviewers took winter holidays less seriously). That's an average of 26 days between updates, and sometimes just 8 days — like the week we launched secure group calls and the gift marketplace.
We wish you a wonderful New Year filled with innovation, inspiration, and goals you'll be proud to achieve.
January 3, 2026
Original source Report a problem
The Telegram Team - Jan 2, 2026
- Parsed from source:Jan 2, 2026
- Detected by Releasebot:Jan 2, 2026
Announcement bars for your archives
A new announcement bar lets you place a banner on archive pages to share breaking news or offers. Customize the message with Markdown, pick a background color, and target who sees it before emails go out.
Announcement bar on archive pages
You can now add an announcement bar to the top of your archive pages — a simple, eye-catching banner for sharing breaking news, special offers, or anything else you want readers to see first.
demo.buttondown.com/settings/archives/announcement
This is a live demo. You can view this page on our live demo site, too.
Head to Settings Archives Announcement to set it up. You can customize the message (Markdown works, so feel free to add links), pick a background color, and control who sees it: everyone, just free subscribers, just paid subscribers, or only logged-out visitors.
It's a small thing, but it's a nice way to highlight something important without having to add it to every email you send. Maybe you're running a holiday sale, or you want to nudge free readers toward a paid subscription, or you just want to say "hey, I'm on vacation until January" — the announcement bar has you covered.
Original source Report a problem - Jan 2, 2026
- Parsed from source:Jan 2, 2026
- Detected by Releasebot:Jan 2, 2026
v2.205.5
2.205.5 (2026-01-02)
Bug Fixes
- add throttle to flag analytics endpoint (#6454) (23d37ca)
- Jan 2, 2026
- Parsed from source:Jan 2, 2026
- Detected by Releasebot:Jan 2, 2026
v2.205.6
2.205.6 (2026-01-02)
Bug Fixes
- add a batch size to the bulk_update of OrganisationSubscriptionInformationCache objects (#6456) (38ac162)
- Jan 2, 2026
- Parsed from source:Jan 2, 2026
- Detected by Releasebot:Jan 2, 2026
Remove deprecated errorCode and errorMessage fields from SubscriptionBillingAttempt
API changes
Starting with API version 2026-04, the errorCode and errorMessage fields on SubscriptionBillingAttempt are now hidden. These fields were deprecated in version 2025-01 with the introduction of processingError. Developers should now use processingError.code and processingError.message instead. If you haven't updated your code to use these new fields, please do so to ensure compatibility with the latest API version.
Original source Report a problem