Anthropic Release Notes

Last updated: Mar 13, 2026

  • Mar 12, 2026
    • Date parsed from source:
      Mar 12, 2026
    • First seen by Releasebot:
      Mar 13, 2026
    Anthropic logo

    Anthropic

    Anthropic invests $100 million into the Claude Partner Network

    Anthropic unveils the Claude Partner Network a new program delivering training, dedicated support, and co-marketing for enterprises adopting Claude. A $100 million 2026 commitment funds partner training, certification Claude Certified Architect Foundations and future credentials, plus global go-to-market support.

    Claude Partner Network overview

    We’re launching the Claude Partner Network, a program for partner organizations helping enterprises adopt Claude. We’re committing an initial $100 million to support our partners with training courses, dedicated technical support, and joint market development. Partners who join from today will get immediate access to a new technical certification and be eligible for investment.

    Anthropic is focused on ensuring that our AI model, Claude, serves the needs of businesses. To do this, we’ve partnered with a number of other companies. Notably, Claude is the only frontier AI model available on all three leading cloud providers: AWS, Google Cloud, and Microsoft.

    We also work with large management consultancies, professional services firms, specialist AI firms, and similar agencies. These organizations help our enterprise customers identify where Claude can provide the most value to their work, and then help them get started with our AI tools. Our partners act as trusted guides in what can feel like uncharted territory: navigating the deployment requirements, compliance, and change management necessary inside large organizations.

    Now, we’re doubling down our commitment to our partners, aiming to make it even easier for these organizations to support enterprises in adopting Claude.

    "Anthropic is the most committed AI company in the world to the partner ecosystem—and we're putting $100 million behind that this year to prove it. The certification, the co-investment, the dedicated team—this infrastructure is built so that any firm, at any scale, can build a Claude practice. Our partners are instrumental in getting enterprises from proof of concept to production with Claude, and we're making sure they have everything they need to do it."—Steve Corfield, Head of Global Business Development and Partnerships, Anthropic.

    Introducing the Claude Partner Network

    The Claude Partner Network provides training, technical support, and joint market development for our partners helping enterprises adopt Claude. We’re committing an initial $100 million to this network for 2026, and expect to invest even more over time.

    A significant proportion of our $100 million investment will go directly to our partners as direct support for training and sales enablement, and for market development (including work to make customer deployments successful) and co-marketing for joint campaigns and events. We’re also scaling our partner-facing team fivefold, so that we can provide dedicated Applied AI engineers to partners working on live customer deals, technical architects to scope more complex implementations, and localized go-to-market support in international markets.

    Those who join the network will have access to our Partner Portal, where we’ll share our Anthropic Academy training materials, the sales playbooks used by our own go-to-market team, and other co-marketing documentation. Qualified partners will also be added to our Services Partner Directory, where enterprise buyers can find firms with Claude implementation experience.

    Alongside the network, we’re introducing the first Claude technical certification: Claude Certified Architect, Foundations, available today for partners. This is a technical exam for solution architects building production applications with Claude. Later this year, we’ll introduce additional certifications for sellers, architects, and developers. Partners who join the network now will get priority access to new certifications as they roll out.

    Finally, we’re launching a Code Modernization starter kit, which gives our partners a straightforward starting point for migrating legacy codebases and remediating enterprises’ technical debt. This is one of the highest-demand enterprise workloads, and one where Claude’s agentic coding capabilities most directly translate into client outcomes.

    Any organization that is bringing Claude to market is eligible to join the Claude Partner Network. Membership is free of charge, and applications open today. You can find out more here.

    Below, our partners share more about their work with Claude:

    We're training 30,000 Accenture professionals on Claude because that's what it takes to meet the demand we're seeing. The Claude Partner Network gives us the structure to do that faster — the certification, the co-selling support, the shared investment. It matches how we actually build practices and deploy teams.

    Enterprise AI needs to be powerful. The Claude Partner Network helps formalize and scale the work underway; the training, industry-focused solutions, and practical guidance for deploying AI.

    Ranjit Bawa
    Global Technology and Ecosystems & Alliances Leader, Deloitte

    We've opened Claude access across our global workforce—supporting an organization of roughly 350,000 associates—and we're embedding it into how we help clients modernize and transform. The Claude Partner Network gives us the co-investment and technical support to move faster, so our clients can advance pilot initiatives toward production without the usual delays.

    Sandra Notardonato
    Head of Global Partnership Development and Influencer Relations, Cognizant

    We are enabling clients to scale AI with confidence—built on robust governance, security, and trust by design. Our dedicated Anthropic Center of Excellence accelerates readiness and capability-building, aligned with Infosys’ AI-first value approach. With teams applying Claude Code in real-world delivery, we are helping clients unlock AI value across industries.

    Anand Swaminathan
    Executive Vice President and Global Head – Communications, Media and Technology, Infosys

    Original source Report a problem
  • Feb 25, 2026
    • Date parsed from source:
      Feb 25, 2026
    • First seen by Releasebot:
      Feb 26, 2026
    Anthropic logo

    Anthropic

    Anthropic acquires Vercept to advance Claude's computer use capabilities

    Anthropic reveals the Vercept acquisition to push Claude’s live app computer use and multi-tool tasks to new frontiers, pairing human-like interaction with AI. It also highlights Claude Sonnet 4.6’s strides toward near human performance in complex tasks like navigating spreadsheets and web forms.

    Acquisition announcement and context

    People are using Claude for increasingly complex work—writing and running code across entire repositories, synthesizing research from dozens of sources, and managing workflows that span multiple tools and teams. Computer use enables Claude to do all of that inside live applications, the way a person at a keyboard would. That means Claude can take on multi-step tasks in live applications, and solve problems impossible with code alone. Today, we're announcing that Anthropic has acquired Vercept to help us push those capabilities further.

    Vercept background

    Vercept was built around a clear thesis: making AI genuinely useful for completing complex tasks requires solving hard perception and interaction problems. The Vercept team—including co-founders Kiana Ehsani, Luca Weihs, and Ross Girshick—have spent years thinking carefully about how AI systems can see and act within the same software humans use every day. That expertise maps directly onto some of the hardest problems we're working on at Anthropic. Vercept will wind down its external product in the coming weeks and join Anthropic in pushing the frontiers of computer use.

    Claude Sonnet 4.6 context

    This acquisition follows the recent launch of Claude Sonnet 4.6, which shows a major improvement in computer use skills: on OSWorld, a widely-used evaluation for AI computer use, our Sonnet models went from under 15% in late 2024, when we first released computer use, to 72.5% today. Sonnet 4.6 is now approaching human-level performance on tasks like navigating complex spreadsheets and completing web forms across browser tabs.

    Acquisition history and philosophy

    Vercept is the latest team we’ve brought into Anthropic, following the acquisition of Bun. We look for teams whose technical ambitions match ours, whose work advances our capabilities, and whose approach to building AI is grounded in the same principles of safety and rigor that guide everything we do.

    Careers

    We’re also looking for individuals. If you're interested in joining Anthropic’s engineering team, visit our careers page.

    Original source Report a problem
  • All of your release notes in one feed

    Join Releasebot and get updates from Anthropic and hundreds of other software products.

  • Feb 24, 2026
    • Date parsed from source:
      Feb 24, 2026
    • First seen by Releasebot:
      Feb 25, 2026
    Anthropic logo

    Anthropic

    Anthropic’s Responsible Scaling Policy: Version 3.0

    Anthropic unveils the third update to its Responsible Scaling Policy, adding a Frontier Safety Roadmap, split mitigations, and regular Risk Reports with external review. It boosts transparency, industry coordination, and risk management, while outlining realistic unilateral and multilateral paths.

    The Responsible Scaling Policy — Third Version

    We’re releasing the third version of our Responsible Scaling Policy (RSP), the voluntary framework we use to mitigate catastrophic risks from AI systems.

    Anthropic has now had an RSP for more than two years, and we’ve learned a great deal about its benefits and its shortcomings. We’re therefore updating the policy to reinforce what has worked well to date, improve the policy where necessary, and implement new measures to increase the transparency and accountability of our decision-making.

    You can read the new RSP in full here. In this post, we’ll discuss some of the thinking behind the changes.

    The original RSP and our theory of change

    The RSP is our attempt to solve the problem of how to address AI risks that are not present at the time the policy is written, but which could emerge rapidly as a result of an exponentially advancing technology. When we wrote the original RSP in September 2023, large language models were essentially chat interfaces. Today they can browse the web, write and run code, use computers, and take autonomous, multi-step actions. As each of these new capabilities have emerged, so have new risks. We expect this pattern to continue.

    We focused the RSP on the principle of conditional, or if-then, commitments. If a model exceeded certain capability levels (for example, biological science capabilities that could assist in the creation of dangerous weapons), then the policy stated that we should introduce a new and stricter set of safeguards (for example, against model misuse and the theft of model weights).

    Each set of safeguards corresponded to an “AI Safety Level” (ASL): for example, ASL-2 referred to one set of required safeguards, whereas ASL-3 referred to a more stringent set of safeguards needed for more capable AI models.

    Early ASLs (ASL-2 and ASL-3) were defined in significant detail, but it was more difficult to specify the correct safeguards for models that were still several generations away. We therefore intentionally left the later ASLs (ASL-4 and beyond) largely undefined, and hoped to develop them in more detail once we had a better picture of what higher AI capability levels would entail.

    The following is a rough description of our “theory of change”—that is, the mechanisms whereby we hoped to affect the ecosystem with the RSP:

    • An internal forcing function. Within Anthropic, we hoped the RSP would compel us to treat important safeguards as requirements for launching (and training) new models. This made the importance of these safeguards clear to the large and growing organization, spurring us on to make faster progress.

    • A race to the top. We hoped that announcing our RSP would encourage other AI companies to introduce similar policies. This is the idea of a “race to the top” (the converse of a “race to the bottom”), in which different industry players are incentivized to improve, rather than weaken, their models’ safeguards and their overall safety posture. Over time, we hoped RSPs, or similar policies, would become voluntary industry standards or go on to inform AI laws aimed at encouraging safety and transparency in AI model development.

    • Creating more consensus about risks. We viewed the capability thresholds as potentially important moments for the industry. If we reached an important capability threshold (such as the ability of AI models to support the end-to-end production of bioweapons), we would institute the appropriate safeguards ourselves and use the evidence we’d obtained about AI capabilities to advocate to other companies and governments that they take action as well. In other words, we believed that the capability thresholds might be good points at which to go beyond unilateral action (Anthropic requiring safeguards for its own models) and encourage multilateral action (other AI companies, and/or governments also requiring such safeguards).

    • Looking to the future. We recognized that, at some of the later capability thresholds, the intensity of countermeasures we were envisioning (for example, achieving high robustness against misuse of AI models by state-level actors) would likely be difficult or impossible for Anthropic to accomplish unilaterally. We hoped that by the time we reached these higher capabilities, the world would clearly see the dangers, and that we’d be able to coordinate with governments worldwide in implementing safeguards that are difficult for one company to achieve alone.

    Assessing our theory of change

    Two and a half years later, our honest assessment is that some parts of this theory of change have played out as we hoped, but others have not. The following are the areas in which the RSP has been successful:

    • Our RSP did incentivize us to develop stronger safeguards. For example, in order to comply with our ASL-3 deployment standard (which is primarily about risks from chemical and biological weapons from threat actors with relatively modest resources and expertise), we developed increasingly sophisticated and accurate methods (specifically, input and output classifiers) to block content of concern.

    • More broadly, the overall implementation of the ASL-3 standard did prove feasible. We activated ASL-3 safeguards for relevant models in May 2025 and have been working to improve them ever since.

    • Our RSP did encourage other AI companies to adopt somewhat similar standards: within a few months of announcing our RSP, both OpenAI and Google DeepMind adopted broadly similar frameworks. Some companies have also implemented bioweapon-related classifiers in a similar vein to our ASL-3 defenses. The principles behind these voluntary standards, including those in the RSP, have helped to inform the development of early AI policy. We’ve seen governments around the world (for example in California with SB 53, in New York with the RAISE Act, and with the EU AI Act’s Codes of Practice) start to require frontier AI developers to create and publish frameworks for assessing and managing catastrophic risks—requirements Anthropic addresses through public documentation including its Frontier Compliance Framework. Encouraging these kinds of rigorous transparency frameworks for the industry was exactly what our RSP had set out to do.

    Nevertheless, other parts of our theory of change have not panned out as we’d hoped:

    • The idea of using the RSP thresholds to create more consensus about AI risks did not play out in practice—although there was some of this effect. We found pre-set capability levels to be far more ambiguous than we anticipated: in some cases, model capabilities have clearly approached the RSP thresholds, but we have had substantial uncertainty about whether they have definitively passed those thresholds. The science of model evaluation isn’t well-developed enough to provide dispositive answers. In such cases, we have taken a precautionary approach and implemented the relevant safeguards, but our internal uncertainty translates into a weak external case for taking multilateral action across the AI industry.
      ◦ Biological risks provide an example of this “zone of ambiguity”. Our models now show enough biological knowledge that they pass most tests we can run quickly and easily, so we can no longer make a strong argument that risks are low from a given model. But these tests alone aren’t sufficient for a strong argument that risks are high, either. We’ve sought additional evidence, such as supporting an extensive wet-lab trial, but results remain ambiguous, especially because the studies take long enough that more powerful models are available by the time they’re completed.

    • Despite rapid advances in AI capabilities over the past three years, government action on AI safety has moved slowly. The policy environment has shifted toward prioritizing AI competitiveness and economic growth, while safety-oriented discussions have yet to gain meaningful traction at the federal level. We remain convinced that effective government engagement on AI safety is both necessary and achievable, and we aim to continue advancing a conversation grounded in evidence, national security interests, economic competitiveness, and public trust. But this is proving to be a long-term project—not something that is happening organically as AI becomes more capable or crosses certain thresholds.

    As noted above, we were able to implement ASL-3 safeguards unilaterally and at reasonable costs to the operation of the company. However, this may not remain true for higher capability levels and higher ASLs. While our higher ASLs are largely undefined, the robust mitigations we laid out in the prior RSP might prove outright impossible to implement without collective action. As one illustration of the scale of the challenge, a RAND report on model weight security states that its “SL5” security standard, aimed at stopping top-priority operations by the most cyber-capable institutions, is “currently not possible” and “will likely require assistance from the national security community.”

    The combination of (a) the zone of ambiguity muddling the public case for risk, (b) an anti-regulatory political climate, and (c) requirements at the higher RSP levels that are very hard to meet unilaterally, creates a structural challenge for our current RSP. We could have tried to address this by defining ASL-4 and ASL-5 safeguards in ways that made compliance easy to achieve—but this would undermine the intended spirit of the RSP.

    Instead, we are choosing to acknowledge these challenges transparently and restructure the RSP before we reach these higher levels. The revised RSP aims to adopt more realistic unilateral commitments that are difficult but still achievable in the current environment, while continuing to comprehensively map the risks we believe the full industry needs to address multilaterally.

    Updating our Responsible Scaling Policy

    The new version of our RSP has three key elements.

    • Separating our plans as a company from our recommendations for the industry

      Our RSP now outlines two sets of mitigations: first, the mitigations that we plan to pursue regardless of what others do; and second, an ambitious capabilities-to-mitigations map that, we believe, would help adequately manage the risks from advanced AI if implemented across the AI industry.

      Read the full Responsible Scaling Policy.

    • Frontier Safety Roadmap

      Our new RSP introduces a requirement to develop and publish a Frontier Safety Roadmap, which will describe our concrete plans for risk mitigations across the areas of Security, Alignment, Safeguards, and Policy. Goals described in the Roadmaps are intended to be ambitious, yet achievable—providing the kind of forcing function that we consider to be a past success of our RSP.

      Rather than being hard commitments, these are public goals that we will openly grade our progress towards. This strategy of “nonbinding but publicly-declared” targets borrows from the transparency approach we’ve been championing for frontier AI legislation (although it provides the public with much more detail than is required under existing legislation), and from the successes of our previous RSP versions.

      Some example goals from our current Frontier Safety Roadmap include:

      • Launch “moonshot R&D” projects to investigate ambitious, possibly unconventional ways to achieve unprecedented levels of information security;
      • Develop a method for red-teaming our systems (likely involving significant automation) that surpasses the collective contributions from the hundreds of participants in our bug bounty;
      • Implement a number of systematic measures to ensure Claude behaves according to its constitution;
      • Establish comprehensive, centralized records of all our critical AI development activities, and use AI to analyze these records for issues including concerning behavior by insiders (both human and AI) and security threats;
      • Publish a policy roadmap with concrete proposals for a “regulatory ladder”—policies that scale with increasing risk and that could help guide government AI policy.

      Read the Frontier Safety Roadmap for more on these and our other goals.

    • Risk Reports and external review

      Risk Reports are another way in which we’re improving upon what worked well about our previous RSP. We found that producing a proto-Risk Report, our Safeguards Report from May 2025, was useful for our internal understanding and the public communication of the risks. Risk Reports extend this to a more systematic, comprehensive practice.

      Risk Reports will provide detailed information on the safety profile of our models at the time of publication. They will go beyond describing model capabilities to explain how capabilities, threat models (the specific ways that models might pose threats), and active risk mitigations fit together, and provide an assessment of the overall level of risk. Risk Reports will be published online (with some redactions1) every 3-6 months.

      The new RSP also requires external review of Risk Reports in certain circumstances. We will appoint expert third-party reviewers who are deeply familiar with AI safety research, are incentivized to be open and honest about Anthropic’s safety position, and are free of major conflicts of interest. They will have unredacted or minimally-redacted access to the Risk Report and will subject our reasoning, analysis, and decision-making to a comprehensive public review. Although our current models do not yet require external review, we are already running pilots and working toward this goal.

      Risk Reports will address any gaps between our current safety and security measures and our more ambitious recommendations for industry-wide safety. We are hopeful that describing and publicizing such gaps could help contribute to public awareness and thus to beneficial policy change in the future.

      Read the initial Risk Report.

    Conclusion

    The Responsible Scaling Policy was always planned to be a living document: a policy that had the flexibility to change as AI models become more capable. This third revision amplifies what worked about the previous RSP, commits us to more transparency about our plans and our risk considerations, and separates out our recommendations for the industry at large from what we can achieve as an individual company.

    In that same spirit of pragmatism we will continue to revise and refine our RSP, and our methods of evaluating and mitigating risks, as the technology evolves.

    Footnotes

    1. As we discuss in the RSP, we will aim to minimize redactions to the public version of the Risk Report. Reasons we may nonetheless have to redact some of the text include legal compliance, intellectual property protection, public safety, and privacy.
    Original source Report a problem
  • Feb 20, 2026
    • Date parsed from source:
      Feb 20, 2026
    • First seen by Releasebot:
      Feb 21, 2026
    Anthropic logo

    Anthropic

    Making frontier cybersecurity capabilities available to defenders

    Claude Code Security launches in limited research preview for Enterprise and Team, delivering AI driven code analysis that reasons like a defender, surfaces validated findings with suggested patches, and requires human approval. Open source maintainers can get expedited access.

    How Claude Code Security works

    Claude Code Security, a new capability built into Claude Code on the web, is now available in a limited research preview. It scans codebases for security vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix security issues that traditional methods often miss.

    Security teams face a common challenge: too many software vulnerabilities and not enough people to address them. Existing analysis tools help, but only to a point, as they usually look for known patterns. Finding the subtle, context-dependent vulnerabilities that are often exploited by attackers requires skilled human researchers, who are dealing with ever-expanding backlogs.

    AI is beginning to change that calculus. We’ve recently shown that Claude can detect novel, high-severity vulnerabilities. But the same capabilities that help defenders find and fix vulnerabilities could help attackers exploit them.

    Claude Code Security is intended to put this power squarely in the hands of defenders and protect code against this new category of AI-enabled attack. We’re releasing it as a limited research preview to Enterprise and Team customers, with expedited access for maintainers of open-source repositories, so we can work together to refine its capabilities and ensure it is deployed responsibly.

    How Claude Code Security works

    Static analysis—a widely deployed form of automated security testing—is typically rule-based, meaning it matches code against known vulnerability patterns. That catches common issues, like exposed passwords or outdated encryption, but often misses more complex vulnerabilities, like flaws in business logic or broken access control.

    Rather than scanning for known patterns, Claude Code Security reads and reasons about your code the way a human security researcher would: understanding how components interact, tracing how data moves through your application, and catching complex vulnerabilities that rule-based tools miss.

    Every finding goes through a multi-stage verification process before it reaches an analyst. Claude re-examines each result, attempting to prove or disprove its own findings and filter out false positives. Findings are also assigned severity ratings so teams can focus on the most important fixes first.

    Validated findings appear in the Claude Code Security dashboard, where teams can review them, inspect the suggested patches, and approve fixes. Because these issues often involve nuances that are difficult to assess from source code alone, Claude also provides a confidence rating for each finding. Nothing is applied without human approval: Claude Code Security identifies problems and suggests solutions, but developers always make the call.

    Using Claude for cybersecurity

    Claude Code Security builds on more than a year of research into Claude’s cybersecurity capabilities. Our Frontier Red Team has been stress-testing these abilities systematically: entering Claude in competitive Capture-the-Flag events, partnering with Pacific Northwest National Laboratory to experiment with using AI to defend critical infrastructure, and refining Claude’s ability to find and patch real vulnerabilities in code.

    Claude’s cyberdefensive abilities have improved substantially as a result. Using Claude Opus 4.6, released earlier this month, our team found over 500 vulnerabilities in production open-source codebases—bugs that had gone undetected for decades, despite years of expert review. We’re working through triage and responsible disclosure with maintainers now, and we plan to expand our security work with the open-source community.

    We also use Claude to review our own code, and we’ve found it to be extremely effective at securing Anthropic’s systems. We built Claude Code Security to make those same defensive capabilities more widely available. And since it’s built on Claude Code, teams can review findings and iterate on fixes within the tools they already use.

    The road ahead

    This is a pivotal time for cybersecurity. We expect that a significant share of the world’s code will be scanned by AI in the near future, given how effective models have become at finding long-hidden bugs and security issues.

    Attackers will use AI to find exploitable weaknesses faster than ever. But defenders who move quickly can find those same weaknesses, patch them, and reduce the risk of an attack. Claude Code Security is one step towards our goal of more secure codebases and a higher security baseline across the industry.

    Getting started

    We’re opening a limited research preview of Claude Code Security to Enterprise and Team customers today. Participants will get early access and collaborate directly with our team to hone the tool’s capabilities. We also encourage open-source maintainers to apply for free, expedited access.

    Apply for access here.

    To learn more, visit claude.com/solutions/claude-code-security.

    Original source Report a problem
  • Feb 17, 2026
    • Date parsed from source:
      Feb 17, 2026
    • First seen by Releasebot:
      Mar 6, 2026
    Anthropic logo

    Anthropic

    Introducing Claude Sonnet 4.6

    Anthropic unveils Claude Sonnet 4.6, delivering frontier performance across coding, agents, and professional work at scale.

    Introducing Claude Sonnet 4.6

    Product Feb 17, 2026

    Sonnet 4.6 delivers frontier performance across coding, agents, and professional work at scale.

    Original source Report a problem
  • Feb 17, 2026
    • Date parsed from source:
      Feb 17, 2026
    • First seen by Releasebot:
      Feb 18, 2026
    Anthropic logo

    Anthropic

    Introducing Claude Sonnet 4.6

    Claude Sonnet 4.6 is the most capable Sonnet yet, with a 1M token context, boosted coding and reasoning, and improved computer use. It’s now the default on Free and Pro plans and available across Claude API and platforms, with safety validated and cost unchanged.

    Claude Sonnet 4.6

    Claude Sonnet 4.6 is our most capable Sonnet model yet. It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta.

    For those on our Free and Pro plans, Claude Sonnet 4.6 is now the default model in claude.ai and Claude Cowork. Pricing remains the same as Sonnet 4.5, starting at $3/$15 per million tokens.

    Sonnet 4.6 brings much-improved coding skills to more of our users. Improvements in consistency, instruction following, and more have made developers with early access prefer Sonnet 4.6 to its predecessor by a wide margin. They often even prefer it to our smartest model from November 2025, Claude Opus 4.5.

    Performance that would have previously required reaching for an Opus-class model—including on real-world, economically valuable office tasks—is now available with Sonnet 4.6. The model also shows a major improvement in computer use skills compared to prior Sonnet models.

    As with every new Claude model, we’ve run extensive safety evaluations of Sonnet 4.6, which overall showed it to be as safe as, or safer than, our other recent Claude models. Our safety researchers concluded that Sonnet 4.6 has “a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes forms of misalignment.”

    Computer use

    Almost every organization has software it can’t easily automate: specialized systems and tools built before modern interfaces like APIs existed. To have AI use such software, users would previously have had to build bespoke connectors. But a model that can use a computer the way a person does changes that equation.

    In October 2024, we were the first to introduce a general-purpose computer-using model. At the time, we wrote that it was “still experimental—at times cumbersome and error-prone,” but we expected rapid improvement.

    OSWorld, the standard benchmark for AI computer use, shows how far our models have come. It presents hundreds of tasks across real software (Chrome, LibreOffice, VS Code, and more) running on a simulated computer. There are no special APIs or purpose-built connectors; the model sees the computer and interacts with it in much the same way a person would: clicking a (virtual) mouse and typing on a (virtual) keyboard.

    Across sixteen months, our Sonnet models have made steady gains on OSWorld. The improvements can also be seen beyond benchmarks: early Sonnet 4.6 users are seeing human-level capability in tasks like navigating a complex spreadsheet or filling out a multi-step web form, before pulling it all together across multiple browser tabs.

    The model certainly still lags behind the most skilled humans at using computers. But the rate of progress is remarkable nonetheless. It means that computer use is much more useful for a range of work tasks—and that substantially more capable models are within reach.

    At the same time, computer use poses risks: malicious actors can attempt to hijack the model by hiding instructions on websites in what’s known as a prompt injection attack. We’ve been working to improve our models’ resistance to prompt injections—our safety evaluations show that Sonnet 4.6 is a major improvement compared to its predecessor, Sonnet 4.5, and performs similarly to Opus 4.6. You can find out more about how to mitigate prompt injections and other safety concerns in our API docs.

    Our most capable Sonnet model yet (YouTube video)

    Evaluating Claude Sonnet 4.6

    Beyond computer use, Claude Sonnet 4.6 has improved on benchmarks across the board. It approaches Opus-level intelligence at a price point that makes it more practical for far more tasks. You can find a full discussion of Sonnet 4.6’s capabilities and its safety-related behaviors in our system card; a summary and comparison to other recent models is below.

    In Claude Code, our early testing found that users preferred Sonnet 4.6 over Sonnet 4.5 roughly 70% of the time. Users reported that it more effectively read the context before modifying code and consolidated shared logic rather than duplicating it. This made it less frustrating to use over long sessions than earlier models.

    Users even preferred Sonnet 4.6 to Opus 4.5, our frontier model from November, 59% of the time. They rated Sonnet 4.6 as significantly less prone to overengineering and “laziness,” and meaningfully better at instruction following. They reported fewer false claims of success, fewer hallucinations, and more consistent follow-through on multi-step tasks.

    Sonnet 4.6’s 1M token context window is enough to hold entire codebases, lengthy contracts, or dozens of research papers in a single request. More importantly, Sonnet 4.6 reasons effectively across all that context. This can make it much better at long-horizon planning. We saw this particularly clearly in the Vending-Bench Arena evaluation, which tests how well a model can run a (simulated) business over time—and which includes an element of competition, with different AI models facing off against each other to make the biggest profits.

    Sonnet 4.6 developed an interesting new strategy: it invested heavily in capacity for the first ten simulated months, spending significantly more than its competitors, and then pivoted sharply to focus on profitability in the final stretch. The timing of this pivot helped it finish well ahead of the competition.

    Sonnet 4.6 outperforms Sonnet 4.5 on Vending-Bench Arena by investing in capacity early, then pivoting to profitability in the final stretch.

    Early customers also reported broad improvements, with frontend code and financial analysis standing out. Customers independently described visual outputs from Sonnet 4.6 as notably more polished, with better layouts, animations, and design sensibility than those from previous models. Customers also needed fewer rounds of iteration to reach production-quality results.

    Customer testimonials highlight:

    • Databricks: Claude Sonnet 4.6 matches Opus 4.6 performance on OfficeQA, which measures how well a model can read enterprise documents (charts, PDFs, tables), pull the right facts, and reason from those facts. It’s a meaningful upgrade for document comprehension workloads.
    • Replit: The performance-to-cost ratio of Claude Sonnet 4.6 is extraordinary—it’s hard to overstate how fast Claude models have been evolving in recent months. Sonnet 4.6 outperforms on our orchestration evals, handles our most complex agentic workloads, and keeps improving the higher you push the effort settings. — Michele Catasta, President, Replit
    • Cursor: Claude Sonnet 4.6 is a notable improvement over Sonnet 4.5 across the board, including long-horizon tasks and more difficult problems. — Michael Truell, Co-founder and CEO, Cursor
    • GitHub: Out of the gate, Claude Sonnet 4.6 is already excelling at complex code fixes, especially when searching across large codebases is essential. For teams running agentic coding at scale, we’re seeing strong resolution rates and the kind of consistency developers need. — Joe Binder, VP of Product, GitHub
    • Cognition: Claude Sonnet 4.6 has meaningfully closed the gap with Opus on bug detection, letting us run more reviewers in parallel, catch a wider variety of bugs, and do it all without increasing cost. — Scott Wu, CEO, Cognition
    • Windsurf: For the first time, Sonnet brings frontier-level reasoning in a smaller and more cost-effective form factor. It provides a viable alternative if you are a heavy Opus user. — Jeff Wang, CEO, Windsurf
    • Hebbia: Claude Sonnet 4.6 meaningfully improves the answer retrieval behind our core product—we saw a significant jump in answer match rate compared to Sonnet 4.5 in our Financial Services Benchmark, with better recall on the specific workflows our customers depend on. — Aabhas Sharma, CTO, Hebbia
    • Box: Box evaluated how Claude Sonnet 4.6 performs when tested on deep reasoning and complex agentic tasks across real enterprise documents. It demonstrated significant improvements, outperforming Claude Sonnet 4.5 in heavy reasoning Q&A by 15 percentage points. — Ben Kus, CTO, Box
    • Pace: Claude Sonnet 4.6 hit 94% on our insurance benchmark, making it the highest-performing model we’ve tested for computer use. This kind of accuracy is mission-critical to workflows like submission intake and first notice of loss. — Jamie Cuffe, CEO, Pace
    • Bolt: Claude Sonnet 4.6 delivers frontier-level results on complex app builds and bug-fixing. It’s becoming our go-to for the kind of deep codebase work that used to require more expensive models. — Eric Simons, CEO, Bolt
    • Rakuten: Claude Sonnet 4.6 produced the best iOS code we’ve tested for Rakuten AI. Better spec compliance, better architecture, and it reached for modern tooling we didn’t ask for, all in one shot. The results genuinely surprised us. — Yusuke Kaji, General Manager, AI, Rakuten
    • Zapier: Sonnet 4.6 is a significant leap forward on reasoning through difficult tasks. We find it especially strong on branched and multi-step tasks like contract routing, conditional template selection, and CRM coordination—exactly where our customers need strong model sense and reliability. — Wade Foster, Co-founder and CEO, Zapier
    • Convey: We’ve been impressed by how accurately Claude Sonnet 4.6 handles complex computer use. It’s a clear improvement over anything else we’ve tested in our evals. — Will Harvey, Co-founder, Convey
    • Triple Whale: Claude Sonnet 4.6 has perfect design taste when building frontend pages and data reports, and it requires far less hand-holding to get there than anything we’ve tested before. — AJ Orbach, Co-founder, Triple Whale
    • Harvey: Claude Sonnet 4.6 was exceptionally responsive to direction — delivering precise figures and structured comparisons when asked, while also generating genuinely useful ideas on trial strategy and exhibit preparation. — Niko Grupen, Head of Applied Research, Harvey

    Product updates

    On the Claude Developer Platform, Sonnet 4.6 supports both adaptive thinking and extended thinking, as well as context compaction in beta, which automatically summarizes older context as conversations approach limits, increasing effective context length.

    On our API, Claude’s web search and fetch tools now automatically write and execute code to filter and process search results, keeping only relevant content in context—improving both response quality and token efficiency. Additionally, code execution, memory, programmatic tool calling, tool search, and tool use examples are now generally available.

    Sonnet 4.6 offers strong performance at any thinking effort, even with extended thinking off. As part of your migration from Sonnet 4.5, we recommend exploring across the spectrum to find the ideal balance of speed and reliable performance, depending on what you’re building.

    We find that Opus 4.6 remains the strongest option for tasks that demand the deepest reasoning, such as codebase refactoring, coordinating multiple agents in a workflow, and problems where getting it just right is paramount.

    For Claude in Excel users, our add-in now supports MCP connectors, letting Claude work with the other tools you use day-to-day, like S&P Global, LSEG, Daloopa, PitchBook, Moody’s, and FactSet. You can ask Claude to pull in context from outside your spreadsheet without ever leaving Excel. If you’ve already set up MCP connectors in Claude.ai, those same connections will work in Excel automatically. This is available on Pro, Max, Team, and Enterprise plans.

    How to use Claude Sonnet 4.6

    Claude Sonnet 4.6 is available now on all Claude plans, Claude Cowork, Claude Code, our API, and all major cloud platforms. We’ve also upgraded our free tier to Sonnet 4.6 by default—it now includes file creation, connectors, skills, and compaction.

    If you’re a developer, you can get started quickly by using claude-sonnet-4-6 via the Claude API.

    Original source Report a problem
  • Feb 17, 2026
    • Date parsed from source:
      Feb 17, 2026
    • First seen by Releasebot:
      Feb 17, 2026
    Anthropic logo

    Anthropic

    Anthropic and the Government of Rwanda sign MOU for AI in health and education

    Anthropic and Rwanda ink a three year MOU to bring Claude AI to education health and government services across the country. The agreement includes 2,000 Claude Pro licenses for educators and public servant training plus hands on capacity building for broad public sector use.

    Our collaboration spans three areas

    • Accelerating Rwanda’s health goals: Anthropic will support the Ministry of Health to tackle its ambitious national health goals, including its plan to eliminate cervical cancer and its ongoing efforts to reduce malaria and maternal mortality.
    • Enabling Rwanda’s public sector developers: Developer teams across government institutions will use Claude and Claude Code. Along with hands-on training, capacity building, and API credits, this access will support Rwanda’s broader efforts to integrate AI into other public sector areas.
    • Deepening our education partnership in Rwanda and throughout the region: The MOU formally codifies our fall 2025 education agreement, which included 2,000 Claude Pro licenses for educators across Rwanda, AI literacy training for public servants, and the deployment of a Claude-powered AI learning companion across eight African countries.

    Accelerating AI for health, education, and the public sector

    “This partnership with Anthropic is an important milestone in Rwanda’s AI journey. Our goal is to continue to design and deploy AI solutions that can be applied at a national level to strengthen education, advance health outcomes, and enhance governance with an emphasis on our context,” said Paula Ingabire, Minister of Information and Communications Technology (ICT) and Innovation in Rwanda.

    Anthropic’s Beneficial Deployments team has worked closely with the Ministry of ICT and Innovation and partners to design programs matched to Rwanda’s needs and priorities.

    “Technology is only as valuable as its reach. We’re investing in training, technical support, and capacity building to expand access so that AI can be used safely and independently by teachers, health workers, and public servants throughout Rwanda,” said Elizabeth Kelly, Head of Beneficial Deployments at Anthropic.

    A commitment to AI for the public good

    Today’s announcement builds on our education partnerships, which help students and educators interact with AI, and marks a significant expansion into the health sector.

    Together, these partnerships reflect a long-term collaboration that prioritizes capacity building, responsible deployment, and local autonomy over how new technologies are introduced. By investing in skills, infrastructure, and institutions, we hope to lay the groundwork for AI to deliver lasting value in the sectors that matter most to people’s lives.

    Original source Report a problem
  • Feb 16, 2026
    • Date parsed from source:
      Feb 16, 2026
    • First seen by Releasebot:
      Feb 17, 2026
    Anthropic logo

    Anthropic

    Anthropic opens Bengaluru office and announces new partnerships across India

    Anthropic expands in India with a Bengaluru office and new partnerships across enterprise, education and agriculture, advancing Indic language AI and Claude Code. It also promotes open standards and public sector use, signaling concrete product and rollout momentum.

    Today, as we officially open our Bengaluru office, we’re announcing partnerships across enterprise, education, and agriculture that deepen our commitment to India across a range of sectors.

    India is the second-largest market for Claude.ai, home to a developer community doing some of the most technically intense AI work we see anywhere. Nearly half of Claude usage in India comprises computer and mathematical tasks: building applications, modernizing systems, and shipping production software.

    “India represents one of the world’s most promising opportunities to bring the benefits of responsible AI to vastly more people and enterprises,” said Irina Ghose, Managing Director of India, Anthropic. “Already, it’s home to extraordinary technical talent, digital infrastructure at scale, and a proven track record of using technology to improve people’s lives. That’s exactly the foundation you need to make sure this technology reaches the people who can benefit from it most.”

    Building language capabilities for a billion speakers

    More than a billion people in India speak one of over a dozen officially recognized languages, but AI models continue to perform better in English than they do in other languages. Six months ago, we launched a company-wide effort to narrow this gap by curating higher-quality, more representative training data in 10 of the most widely spoken languages throughout India: Hindi, Bengali, Marathi, Telugu, Tamil, Punjabi, Gujarati, Kannada, Malayalam, and Urdu. This resulted in improvements to our models, and we continue to work on enhancing their fluency.

    Now, Anthropic is working with Karya and the Collective Intelligence Project to build evaluations testing performance on locally relevant tasks across domains like agriculture and law, in partnership with domain experts from leading Indian nonprofits, including Digital Green and Adalat AI. This work will inform how we improve future models for speakers of Indic languages and for use cases important to India and the businesses that use Claude. We intend to make the evaluations publicly available for others to use.

    Partnering with enterprises, digital natives, and startups

    Our run-rate revenue in India has doubled since we announced our expansion in October 2025, and the range of organizations building on Claude reflects how broadly that growth is distributed—from large enterprises to digital-native companies to startups shipping their first products.

    To support this growing customer base, our India team will offer applied AI expertise to enterprise customers, digital natives, and startups, helping them design, build, and scale Claude-powered solutions tailored to their business needs.

    • Air India is using Claude Code to help developers ship custom software faster and at lower cost, as part of a broader push to use agentic AI across its operations.
    • CRED achieved 2x faster feature delivery and 10% better test coverage with Claude Code.
    • Cognizant is deploying Claude to 350,000 employees globally to modernize legacy systems, accelerate software development, and support AI adoption among its enterprise clients.

    Among India’s startups, the story is similar. At Razorpay, AI is integrated into risk systems, decision-making processes, and operations across the company. Rocket uses Claude to let non-technical teams across enterprises build production-ready apps and websites in minutes and hours rather than weeks. At Enterpret, Claude powers its AI assistant, the engineering team builds with Claude Code daily, and the startup has shipped an MCP integration that brings customer insights directly into Claude. And Emergent, an AI-powered platform that lets anyone build software by describing what they want in plain language, reached $25 million in annual recurring revenue and two million users in under five months, built entirely with Claude.

    Reaching students in low-income communities

    Educational and instructional tasks make up 12% of Claude.ai use in India. Pratham, one of India’s largest education nonprofits, chose Anthropic as its first strategic AI lab partner because of our shared focus on safety and educational rigor. Their Anytime Testing Machine, powered by Claude, is currently being piloted with 1,500 students across 20 schools, with plans to expand to 100 schools by the end of 2026. Adapted earlier this year for over 5,000 learners in Pratham’s Second Chance program, which supports women who have dropped out of formal schooling, the Anytime Testing Machine aims to create flexible, credible pathways for learning and certification by helping students practice for exams.

    Anthropic is collaborating with Central Square Foundation to use EdTech and AI more effectively to educate children from underserved communities. As part of this collaboration, Anthropic will provide technical expertise, mentorship, and API credits to organizations developing AI-enabled tools—including personalized tutors, teacher coaching solutions, and assessment-driven instruction—with the goal of reaching more primary school students across India.

    Incorporating AI into the public sector

    India has a track record of building interoperable digital public infrastructure that improves people’s lives. Anthropic is partnering with the EkStep Foundation to explore how AI can build on these efforts and deliver population-scale impact in the domains that matter most to India. Agriculture is one example. It makes up nearly a sixth of the Indian economy and employs nearly half of the labor force. Using the OpenAgriNet effort, we are working towards deployments of Claude that expand access to expert knowledge in this critical sector.

    We’re also demonstrating how Claude Code and Cowork can have an impact within nonprofits themselves—including Noora Health, which delivers accessible health coaching to millions of families, and Intelehealth, which connects patients in remote communities to quality medical care.

    India has 50 million pending court cases, and routine updates often take months to reach litigants. Accessing case information typically requires repeated court visits or intermediaries to navigate paper files and legal jargon. Anthropic is supporting Adalat AI to improve access to judicial services with a national WhatsApp helpline, launching today. Using Claude, it provides instant case updates as well as translation, document summarization, and interactive querying of legal documents in native Indian languages.

    Driving adoption through open-source standards

    Anthropic created the Model Context Protocol (MCP) as a universal open-source standard for connecting AI applications to external systems, and recently donated it to the Linux Foundation.

    The Indian Ministry of Statistics and Programme Implementation (MoSPI), with the support of nonprofit Bharat Digital, recently launched the first official Indian government MCP server, enabling users of AI systems to access and query authoritative national statistics in an open and interoperable manner. In the private sector, Swiggy uses the MCP to allow people to order groceries and make dining reservations directly through Claude.

    Growing Anthropic’s presence in India

    These partnerships will grow in the coming months and years through our expanded presence in India. Our new Bengaluru office—Anthropic’s second in Asia after Tokyo—has officially opened. Led by Managing Director of India Irina Ghose, an enterprise and startup technology leader, the office will focus on hiring local talent across a wide array of roles.

    For information about career opportunities at our Bengaluru office, visit our careers page.

    Original source Report a problem
  • Feb 12, 2026
    • Date parsed from source:
      Feb 12, 2026
    • First seen by Releasebot:
      Feb 17, 2026
    Anthropic logo

    Anthropic

    Anthropic raises $30 billion in Series G funding at $380 billion post-money valuation

    Anthropic secures $30B Series G to accelerate enterprise AI and Claude Code expansion, signaling strong market demand. Claude Code is now public with rapid revenue growth and the new Cowork suite, while Opus 4.6 powers broad enterprise work across AWS, Google, and Azure.

    Series G funding

    We have raised $30 billion in Series G funding led by GIC and Coatue, valuing Anthropic at $380 billion post-money. The round was co-led by D. E. Shaw Ventures, Dragoneer, Founders Fund, ICONIQ, and MGX. The investment will fuel the frontier research, product development, and infrastructure expansions that have made Anthropic the market leader in enterprise AI and coding.

    Significant investors in this round include:

    • Accel
    • Addition
    • Alpha Wave Global
    • Altimeter
    • AMP PBC
    • Appaloosa LP
    • Baillie Gifford
    • Bessemer Venture Partners
    • affiliated funds of BlackRock
    • Blackstone
    • D1 Capital Partners
    • Fidelity Management & Research Company
    • General Catalyst
    • Greenoaks
    • Growth Equity at Goldman Sachs Alternatives
    • Insight Partners
    • Jane Street
    • JPMorganChase through its Security and Resiliency Initiative and Growth Equity Partners
    • Lightspeed Venture Partners
    • Menlo Ventures
    • Morgan Stanley Investment Management
    • NX1 Capital
    • Qatar Investment Authority (QIA)
    • Sands Capital
    • Sequoia Capital
    • Temasek
    • TowerBrook
    • TPG
    • Whale Rock Capital
    • XN

    This round also includes a portion of the previously announced investments from Microsoft and NVIDIA.

    “Whether it is entrepreneurs, startups, or the world’s largest enterprises, the message from our customers is the same: Claude is increasingly becoming critical to how businesses work,” said Krishna Rao, Anthropic’s Chief Financial Officer. “This fundraising reflects the incredible demand we are seeing from these customers, and we will use this investment to continue building the enterprise-grade products and models they have come to depend on.”

    It has been less than three years since Anthropic earned its first dollar in revenue. Today, our run-rate revenue is $14 billion, with this figure growing over 10x annually in each of those past three years.

    This growth has been driven by our position as the intelligence platform of choice for enterprises and developers. The number of customers spending over $100,000 annually on Claude (as represented by run-rate revenue) has grown 7x in the past year. And businesses that start with Claude for a single use case—API, Claude Code, or Claude for Work—are expanding their integrations across their organizations. Two years ago, a dozen customers spent over $1 million with us on an annualized basis. Today that number exceeds 500. Eight of the Fortune 10 are now Claude customers.

    Claude Code

    Claude Code represents a new era of agentic coding, fundamentally changing how teams build software. Claude Code was made available to the general public in May 2025. Today, Claude Code’s run-rate revenue has grown to over $2.5 billion; this figure has more than doubled since the beginning of 2026. The number of weekly active Claude Code users has also doubled since January 1. A recent analysis estimated that 4% of all GitHub public commits worldwide were being authored by Claude Code—double the percentage from just one month prior.

    Business subscriptions to Claude Code have quadrupled since the start of 2026, and enterprise use has grown to represent over half of all Claude Code revenue. The same capabilities that make Claude exceptional for coding are also unlocking other new categories of work: financial and data analysis, sales, cybersecurity, scientific discovery, and beyond.

    In January alone, we launched more than thirty products and features, including Cowork, which brings Claude Code’s powerful engineering capabilities to a broader scope of knowledge work tasks. Cowork includes eleven open-source plugins that let customers turn Claude into a specialist for specific roles or teams, like sales, legal, or finance. We also expanded our reach into healthcare and life sciences, with Claude for Enterprise now available to organizations operating under HIPAA.

    “Since our initial investment in 2025, Anthropic’s focus on agentic coding and enterprise-grade AI systems has accelerated its progress toward large-scale adoption,” said Philippe Laffont, Founder & Portfolio Manager of Coatue. “The team’s ability to rapidly scale its offerings further positions Anthropic as a leader in a highly competitive AI market.”

    Claude’s frontier-setting intelligence continues to advance. Our newest model—Opus 4.6, launched last week—can power agents that manage entire categories of real-world work, generating documents, spreadsheets, and presentations with professional polish. And Opus 4.6 is the world’s leading model on GDPval-AA, which measures performance on economically valuable knowledge work tasks in finance, legal, and other domains.

    “Anthropic is the clear category leader in enterprise AI, demonstrating breakthrough capabilities and setting a new standard for safety, performance, and scale that will drive their long-term success,” said Choo Yong Cheen, Chief Investment Officer, Private Equity, GIC.

    The Series G will also power our infrastructure expansion as we make Claude available everywhere our customers are. Claude remains the only frontier AI model available to customers on all three of the world's largest cloud platforms: Amazon Web Services (Bedrock), Google Cloud (Vertex AI), and Microsoft Azure (Foundry). We train and run Claude on a diversified range of AI hardware—AWS Trainium, Google TPUs, and NVIDIA GPUs—which means we can match workloads to the chips best suited for them. This diversity of platforms translates to better performance and greater resilience for the enterprise customers that depend on Claude for critical work.

    The demand we are seeing from enterprises and developers reflects the trust they place in Claude for the work that matters most. As AI moves toward scaled implementation, we will continue to build the models, products, and partnerships to lead that transition.

    Original source Report a problem
  • Feb 5, 2026
    • Date parsed from source:
      Feb 5, 2026
    • First seen by Releasebot:
      Feb 9, 2026
    Anthropic logo

    Anthropic

    Introducing Claude Opus 4.6

    Claude Opus 4.6 launches with a 1M token context and stronger coding, planning, and debugging capabilities. It adds adaptive thinking, extended context handling, and new office integrations plus developer controls for longer running tasks. Available now via claude.ai, API, and clouds.

    First impressions

    We’re upgrading our smartest model.
    The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, can operate more reliably in larger codebases, and has better code review and debugging skills to catch its own mistakes. And, in a first for our Opus-class models, Opus 4.6 features a 1M token context window in beta.
    Opus 4.6 can also apply its improved abilities to a range of everyday work tasks: running financial analyses, doing research, and using and creating documents, spreadsheets, and presentations. Within Cowork, where Claude can multitask autonomously, Opus 4.6 can put all these skills to work on your behalf.
    The model’s performance is state-of-the-art on several evaluations. For example, it achieves the highest score on the agentic coding evaluation Terminal-Bench 2.0 and leads all other frontier models on Humanity’s Last Exam, a complex multidisciplinary reasoning test. On GDPval-AA—an evaluation of performance on economically valuable knowledge work tasks in finance, legal, and other domains1—Opus 4.6 outperforms the industry’s next-best model (OpenAI’s GPT-5.2) by around 144 Elo points,2 and its own predecessor (Claude Opus 4.5) by 190 points. Opus 4.6 also performs better than any other model on BrowseComp, which measures a model’s ability to locate hard-to-find information online.
    As we show in our extensive system card, Opus 4.6 also shows an overall safety profile as good as, or better than, any other frontier model in the industry, with low rates of misaligned behavior across safety evaluations.
    In Claude Code, you can now assemble agent teams to work on tasks together. On the API, Claude can use compaction to summarize its own context and perform longer-running tasks without bumping up against limits. We’re also introducing adaptive thinking, where the model can pick up on contextual clues about how much to use its extended thinking, and new effort controls to give developers more control over intelligence, speed, and cost.
    We’ve made substantial upgrades to Claude in Excel, and we’re releasing Claude in PowerPoint in a research preview. This makes Claude much more capable for everyday work.
    Claude Opus 4.6 is available today on claude.ai, our API, and all major cloud platforms. If you’re a developer, use claude-opus-4-6 via the Claude API. Pricing remains the same at $5/$25 per million tokens; for full details, see our pricing page.
    We cover the model, our new product updates, our evaluations, and our extensive safety testing in depth below.

    Evaluating Claude Opus 4.6

    Across agentic coding, computer use, tool use, search, and finance, Opus 4.6 is an industry-leading model, often by a wide margin. The table below shows how Claude Opus 4.6 compares to our previous models and to other industry models on a variety of benchmarks.
    Opus 4.6 is much better at retrieving relevant information from large sets of documents. This extends to long-context tasks, where it holds and tracks information over hundreds of thousands of tokens with less drift, and picks up buried details that even Opus 4.5 would miss.
    A common complaint about AI models is “context rot,” where performance degrades as conversations exceed a certain number of tokens. Opus 4.6 performs markedly better than its predecessors: on the 8-needle 1M variant of MRCR v2—a needle-in-a-haystack benchmark that tests a model’s ability to retrieve information “hidden” in vast amounts of text—Opus 4.6 scores 76%, whereas Sonnet 4.5 scores just 18.5%. This is a qualitative shift in how much context a model can actually use while maintaining peak performance.
    All in all, Opus 4.6 is better at finding information across long contexts, better at reasoning after absorbing that information, and has substantially better expert-level reasoning abilities in general.
    A step forward on safety
    These intelligence gains do not come at the cost of safety. On our automated behavioral audit, Opus 4.6 showed a low rate of misaligned behaviors such as deception, sycophancy, encouragement of user delusions, and cooperation with misuse. Overall, it is just as well-aligned as its predecessor, Claude Opus 4.5, which was our most-aligned frontier model to date. Opus 4.6 also shows the lowest rate of over-refusals—where the model fails to answer benign queries—of any recent Claude model.
    For Claude Opus 4.6, we ran the most comprehensive set of safety evaluations of any model, applying many different tests for the first time and upgrading several that we’ve used before. We included new evaluations for user wellbeing, more complex tests of the model’s ability to refuse potentially dangerous requests, and updated evaluations of the model’s ability to surreptitiously perform harmful actions. We also experimented with new methods from interpretability, the science of the inner workings of AI models, to begin to understand why the model behaves in certain ways—and, ultimately, to catch problems that standard testing might miss.
    A detailed description of all capability and safety evaluations is available in the Claude Opus 4.6 system card.
    We’ve also applied new safeguards in areas where Opus 4.6 shows particular strengths that might be put to dangerous as well as beneficial uses. In particular, since the model shows enhanced cybersecurity abilities, we’ve developed six new cybersecurity probes—methods of detecting harmful responses—to help us track different forms of potential misuse.
    We’re also accelerating the cyberdefensive uses of the model, using it to help find and patch vulnerabilities in open-source software (as we describe in our new cybersecurity blog post). We think it’s critical that cyberdefenders use AI models like Claude to help level the playing field. Cybersecurity moves fast, and we’ll be adjusting and updating our safeguards as we learn more about potential threats; in the near future, we may institute real-time intervention to block abuse.

    Product and API updates

    We’ve made substantial updates across Claude, Claude Code, and the Claude Developer Platform to let Opus 4.6 perform at its best.

    Claude Developer Platform

    On the API, we’re giving developers better control over model effort and more flexibility for long-running agents. To do so, we’re introducing the following features:

    • Adaptive thinking. Previously, developers only had a binary choice between enabling or disabling extended thinking. Now, with adaptive thinking, Claude can decide when deeper reasoning would be helpful. At the default effort level (high), the model uses extended thinking when useful, but developers can adjust the effort level to make it more or less selective.
    • Effort. There are now four effort levels to choose from: low, medium, high (default), and max. We encourage developers to experiment with different options to find what works best.
    • Context compaction (beta). Long-running conversations and agentic tasks often hit the context window. Context compaction automatically summarizes and replaces older context when the conversation approaches a configurable threshold, letting Claude perform longer tasks without hitting limits.
    • 1M token context (beta). Opus 4.6 is our first Opus-class model with 1M token context. Premium pricing applies for prompts exceeding 200k tokens ($10/$37.50 per million input/output tokens).
    • 128k output tokens. Opus 4.6 supports outputs of up to 128k tokens, which lets Claude complete larger-output tasks without breaking them into multiple requests.
    • US-only inference. For workloads that need to run in the United States, US-only inference is available at 1.1× token pricing.

    Product updates

    Across Claude and Claude Code, we’ve added features that allow knowledge workers and developers to tackle harder tasks with more of the tools they use every day.
    We’ve introduced agent teams in Claude Code as a research preview. You can now spin up multiple agents that work in parallel as a team and coordinate autonomously—best for tasks that split into independent, read-heavy work like codebase reviews. You can take over any subagent directly using Shift+Up/Down or tmux.
    Claude now also works better with the office tools you already use. Claude in Excel handles long-running and harder tasks with improved performance, and can plan before acting, ingest unstructured data and infer the right structure without guidance, and handle multi-step changes in one pass. Pair that with Claude in PowerPoint, and you can first process and structure your data in Excel, then bring it to life visually in PowerPoint. Claude reads your layouts, fonts, and slide masters to stay on brand, whether you’re building from a template or generating a full deck from a description. Claude in PowerPoint is now available in research preview for Max, Team, and Enterprise plans.

    Original source Report a problem

Related products