AI Language Models Release Notes

Release notes for large language models, APIs and AI platforms

Get this feed:

Products (14)

Latest AI Language Models Updates

  • Apr 17, 2026
    • Date parsed from source:
      Apr 17, 2026
    • First seen by Releasebot:
      Apr 18, 2026
    xAI logo

    xAI

    Grok Speech to Text and Text to Speech APIs

    xAI releases standalone Grok Speech to Text and Text to Speech APIs, bringing low-latency transcription and natural voice generation to developers. The launch includes real-time and batch endpoints, multilingual support, speaker diarization, timestamps, and expressive speech tags.

    Today, we are excited to announce two powerful standalone audio APIs: Grok Speech to Text (STT) and Grok Text to Speech (TTS). Built on the same stack that powers Grok Voice, Tesla vehicles, and Starlink customer support.

    These standalone endpoints make it straightforward for developers to integrate high-quality speech features into any application, whether you're creating voice agents, real-time transcription tools, accessibility solutions, podcasts, or interactive audio experiences.

    Speech to Text

    High accuracy, low latency.

    • Generate transcripts from large audio files in milliseconds via our REST API
    • Transcribe speech in real time with our lowest latency WebSocket API

    We’ve added powerful features like word-level timestamps, speaker diarization, and multichannel support. It further includes intelligent Inverse Text Normalization that correctly handles numbers, dates, currencies, and more.

    Pricing

    We keep pricing straightforward and predictable: Speech to Text is $0.10 per hour for batch and $0.20 per hour for streaming. Full details and current rate limits are available in the xAI API console.

    Enterprise-Grade Transcription

    Grok STT is evaluated against the top commercial models on phone calls, meetings, video/podcasts, and telephony. It excels at entity recognition and business use cases like medical, legal, and financial.

    Most transcription models give you raw spoken words. Grok Speech to Text goes further.

    When you enable formatting, the API performs advanced Inverse Text Normalization that intelligently converts spoken language into proper structured output:

    Multilingual fluency

    The Grok Speech to Text API offers strong multilingual support across 25+ languages, switch languages seamlessly without missing a beat.

    Multichannel & Diarization (Speaker Identification)

    Transcribe multichannel audio files for perfect speaker separation with the same API.

    Detect speakers in both pre-recorded and real-time streaming with word-level speaker IDs using Diarization.

    Text to Speech

    Fast, natural, and expressive voices with Speech Tags.

    • Turn long-form text into speech with our REST API
    • Generate speech in real time with our WebSocket API

    Fine-Grained Control

    Add natural prosody and emotion using simple inline and wrapping speech tags: [laugh], [sigh], [whisper], , , , and many more. These controls let you create engaging, lifelike delivery without complex markup.

    Pricing

    Text to Speech is priced at $4.20 per 1 million characters, with straightforward usage-based billing and no hidden fees.

    Original source
  • Apr 17, 2026
    • Date parsed from source:
      Apr 17, 2026
    • First seen by Releasebot:
      Apr 18, 2026
    Anthropic logo

    Claude by Anthropic

    April 17, 2026

    Claude launches Claude Design for collaborating with Claude to create designs, prototypes, slides, and one-pagers.

    Claude Design by Anthropic Labs

    With Opus 4.7, we also launched Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create visual outputs like designs, prototypes, slides, and one-pagers. For more information, see Get started with Claude Design .

    Original source
  • All of your release notes in one feed

    Join Releasebot and get updates from xAI and hundreds of other software products.

    Create account
  • Apr 17, 2026
    • Date parsed from source:
      Apr 17, 2026
    • First seen by Releasebot:
      Apr 18, 2026
    Anthropic logo

    Anthropic

    Introducing Claude Design by Anthropic Labs

    Anthropic launches Claude Design, a new Anthropic Labs product for creating polished visual work with Claude, including prototypes, slides, one-pagers, and more. It rolls out in research preview for Pro, Max, Team, and Enterprise users and adds sharing, export, and Claude Code handoff support.

    Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more.

    Claude Design is powered by our most capable vision model, Claude Opus 4.7, and is available in research preview for Claude Pro, Max, Team, and Enterprise subscribers. We’re rolling out to users gradually throughout the day.

    Design with Claude

    Even experienced designers have to ration exploration—there's rarely time to prototype a dozen directions, so you limit yourself to a few. And for founders, product managers, and marketers with an idea but not a design background, creating and sharing those ideas can be daunting.

    Claude Design gives designers room to explore widely and everyone else a way to produce visual work. Describe what you need and Claude builds a first version. From there, you refine through conversation, inline comments, direct edits, or custom sliders (made by Claude) until it’s right. When given access, Claude can also apply your team’s design system to every project automatically, so the output is consistent with the rest of your company’s designs.

    Teams have been using Claude Design for:

    • Realistic prototypes: Designers can turn static mockups into easily-shareable interactive prototypes to gather feedback and user-test, without code review or PRs.
    • Product wireframes and mockups: Product Managers can sketch out feature flows and hand them off to Claude Code for implementation, or share them with designers to refine further.
    • Design explorations: Designers can quickly create a wide range of directions to explore.
    • Pitch decks and presentations: Founders and Account Executives can go from a rough outline to a complete, on-brand deck in minutes, and then export as a PPTX or send to Canva.
    • Marketing collateral: Marketers can create landing pages, social media assets, and campaign visuals, then loop in designers to polish.
    • Frontier design: Anyone can build code-powered prototypes with voice, video, shaders, 3D and built-in AI.

    How it works

    Claude Design follows a natural creative flow.

    Your brand, built in.

    During onboarding, Claude builds a design system for your team by reading your codebase and design files. Every project after that uses your colors, typography, and components automatically. You can refine the system over time, and teams can maintain more than one.

    Import from anywhere.

    Start from a text prompt, upload images and documents (DOCX, PPTX, XLSX), or point Claude at your codebase. You can also use the web capture tool to grab elements directly from your website so prototypes look like the real product.

    Refine with fine-grained controls.

    Comment inline on specific elements, edit text directly, or use adjustment knobs to tweak spacing, color, and layout live. Then ask Claude to apply your changes across the full design.

    Collaborate.

    Designs have organization-scoped sharing. You can keep a document private, share it so anyone in your organization with the link can view it, or grant edit access so colleagues can modify the design and chat with Claude together in a group conversation.

    Export anywhere.

    Share designs as an internal URL within your organization, save as a folder, or export to Canva, PDF, PPTX, or standalone HTML files.

    Handoff to Claude Code.

    When a design is ready to build, Claude packages everything into a handoff bundle that you can pass to Claude Code with a single instruction.

    Over the coming weeks, we'll make it easier to build integrations with Claude Design, so you can connect it to more of the tools your team already uses.

    We’ve loved collaborating with Anthropic over the past couple of years and share a deep focus on making complex things simple. At Canva, our mission has always been to empower the world to design, and that means bringing Canva to wherever ideas begin. We’re excited to build on our collaboration with Claude, making it seamless for people to bring ideas and drafts from Claude Design into Canva, where they instantly become fully editable and collaborative designs ready to refine, share, and publish.

    Melanie Perkins
    Co-Founder and CEO, Canva

    Brilliant's intricate interactivity and animations are historically painful to prototype, but Claude Design's ability to turn static designs into interactive prototypes has been a step change for us. Our most complex pages, which took 20+ prompts to recreate in other tools, only required 2 prompts in Claude Design. Including design intent in Claude Code handoffs has made the jump from prototype to production seamless.

    Olivia Xu
    Senior Product Designer, Brilliant

    Claude Design has made prototyping dramatically faster for our team, enabling live design during conversations. We've gone from a rough idea to a working prototype before anyone leaves the room, and the output stays true to our brand and design guidelines. What used to take a week of back-and-forth between briefs, mockups, and review rounds now happens in a single conversation.

    Aneesh Kethini
    Product Manager, Datadog

    Get started

    Claude Design is available for Claude Pro, Max, Team, and Enterprise subscribers. Access is included with your plan and uses your subscription limits, with the option to continue beyond those limits by enabling extra usage.

    For Enterprise organizations, Claude Design is off by default. Admins can enable it in Organization settings.

    Start designing at claude.ai/design.

    Original source
  • Apr 16, 2026
    • Date parsed from source:
      Apr 16, 2026
    • First seen by Releasebot:
      Apr 17, 2026
    Anthropic logo

    Claude by Anthropic

    April 16, 2026

    Claude launches Opus 4.7, with stronger coding, better long-running software tasks and higher-resolution vision.

    Claude Opus 4.7 launch

    Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 shows improvements in software engineering and complex, long-running coding tasks, as well as better vision, allowing it to see images in higher resolution. For more information, see our blog post: Introducing Claude Opus 4.7 .

    Original source
  • Apr 16, 2026
    • Date parsed from source:
      Apr 16, 2026
    • First seen by Releasebot:
      Apr 17, 2026
    OpenAI logo

    OpenAI

    Codex for (almost) everything

    OpenAI releases a major Codex update that expands beyond coding into computer use, web workflows, image generation, memory, automations, and deeper developer tools for reviews, terminals, SSH devboxes, and in-app browsing.

    We’re releasing a major update to Codex, making it a more powerful partner for the more than 3 million developers who use it every week to accelerate work across the full software development lifecycle.

    Codex can now operate your computer alongside you, work with more of the tools and apps you use everyday, generate images, remember your preferences, learn from previous actions, and take on ongoing and repeatable work. The Codex app also now includes deeper support for developer workflows, like reviewing PRs, viewing multiple files & terminals, connecting to remote devboxes via SSH, and an in-app browser to make it faster to iterate on frontend designs, apps, and games.

    Extending Codex beyond coding

    With background computer use, Codex can now use all of the apps on your computer by seeing, clicking, and typing with its own cursor. Multiple agents can work on your Mac in parallel, without interfering with your own work in other apps. For developers, this is helpful for iterating on frontend changes, testing apps, or working in apps that don’t expose an API.

    Codex is also beginning to work natively with the web. The app now includes an in-app browser, where you can comment directly on pages to provide precise instructions to the agent. This is useful for frontend and game development today, and over time we plan to expand it so Codex can fully command the browser beyond web applications on localhost.

    Codex can now use gpt-image-1.5 to generate and iterate on images. Combined with screenshots and code, it is helpful for creating visuals for product concepts, frontend designs, mockups, and games inside the same workflow.

    We’re also releasing more than 90 additional plugins, which combine skills, app integrations, and MCP servers to give Codex more ways to gather context and take action across your tools. Some of the new plugins developers will find most useful include Atlassian Rovo to help manage JIRA, CircleCI, CodeRabbit, GitLab Issues, Microsoft Suite, Neon by Databricks, Remotion, Render, and Superpowers.

    Working across the software development lifecycle

    The app now includes support for addressing GitHub review comments, running multiple terminal tabs, and connecting to remote devboxes over SSH in alpha. It also lets you open files directly in the sidebar with rich previews for PDFs, spreadsheets, slides, and docs, and use a new summary pane to track agent plans, sources, and artifacts.

    Together, these improvements make it faster to move across all the stages of the software development lifecycle between writing code, checking outputs, reviewing changes, and collaborating with the agent in one workspace.

    Carry work forward over time

    We have expanded automations to allow re-using existing conversation threads, preserving context previously built up. Codex can now schedule future work for itself and wake up automatically to continue on a long-term task, potentially across days or weeks.

    Teams use automations for everything from landing open pull requests to following up on tasks and staying on top of fast-moving conversations across tools like Slack, Gmail, and Notion.

    We’re also releasing a preview of memory, which allows Codex to remember useful context from previous experience, including personal preferences, corrections and information that took time to gather. This helps future tasks complete faster and to a level of quality previously only possible through extensive custom instructions.

    Codex now also proactively proposes useful work to continue where you have left off. Using context from projects, connected plugins, and memory, Codex can now suggest how to start your work day or where to pick up on a previous project. For example Codex can identify open comments in Google Docs that require your attention, pull relevant context from Slack, Notion, and your codebase, then provide you with a prioritized list of actions.

    Availability

    Starting today, these updates are rolling out to Codex desktop app users who are signed in with ChatGPT.

    Personalization features including context-aware suggestions and memory will roll out to Enterprise, Edu, and EU and UK users soon. Computer use is initially available on macOS, and will roll out to EU and UK users soon.

    If you’ve been using Codex in the terminal or editor, try it across the rest of your workflow. If you haven’t tried Codex yet, download the app and get started.

    What’s next

    In just the year since Codex launched, the ways developers are using Codex has expanded. Developers start with Codex to write code, then increasingly use it to understand systems, gather context, review work, debug issues, coordinate with teammates, and keep longer-running work moving.

    Our mission is to ensure that AGI benefits all of humanity. That includes narrowing the gap between what people can imagine and what they can build. This release brings Codex closer to the tools, workflows, and decisions involved in building software, with much more to come soon.

    Original source
  • Apr 16, 2026
    • Date parsed from source:
      Apr 16, 2026
    • First seen by Releasebot:
      Apr 17, 2026
    OpenAI logo

    ChatGPT by OpenAI

    April 16, 2026

    ChatGPT rolls out ads for Free and Go users in Australia, New Zealand, and Canada.

    We're beginning to rollout ads for users on Free and Go plans in Australia, New Zealand, and Canada. Plus, Pro, Business, Enterprise, and Education plans do not have ads.

    Original source
  • Apr 16, 2026
    • Date parsed from source:
      Apr 16, 2026
    • First seen by Releasebot:
      Apr 17, 2026
    Google logo

    Gemini by Google

    New ways to create personalized images in the Gemini app

    Gemini now uses personal context and Google Photos in Nano Banana 2 to create more personal images.

    Nano Banana 2 now uses your personal context and Google Photos to create images that reflect your unique life.

    Original source
  • Apr 16, 2026
    • Date parsed from source:
      Apr 16, 2026
    • First seen by Releasebot:
      Apr 17, 2026
    Google logo

    Gemini by Google

    New ways to create personalized images in the Gemini app

    Gemini introduces more personal image generation with Personal Intelligence, Nano Banana 2 and Google Photos, letting eligible subscribers create custom images with less prompting, no manual uploads and built-in controls to refine results while keeping privacy and opt-in settings intact.

    Use Personal Intelligence to create more relevant, personal images using Nano Banana and your own Google Photos library — no manual uploads or long prompts required.

    Personal Intelligence makes the Gemini app feel tailored to you, not just a generic tool that works the same for everyone. Today, we’re introducing new ways for Gemini to use your interests and preferences with Nano Banana 2 and Google Photos to make image generation — one of your favorite ways to use Gemini — feel deeply personal. This lets you create unique images more easily, so you can spend more time creating and less time explaining.

    Powering your imagination

    One of the biggest hurdles in AI image generation is finding the right prompt. Previously, to get a result that felt truly personal, you had to write long, detailed descriptions and manually upload a reference photo just to give Gemini the right context.

    Now, Personal Intelligence gives Gemini an inherent understanding of your preferences from the start. By integrating this context directly with Nano Banana 2, Gemini can automatically fill in the blanks, grounding every creation in the things you care about most. And since this is built into how you normally use the Gemini app there’s no extra setup. If you’ve already linked your Google apps, that personal context is ready and waiting the moment you start creating images.

    This removes the heavy lifting. Instead of writing out the intricate details of your life, you can use simple prompts like "Design my dream house" or "Create a picture of my desert island essentials" and the results will automatically reflect your specific tastes and lifestyle, gleaned from the Google apps you’ve connected to.

    Starring you and your loved ones

    A lot of your most significant moments live in your Google Photos library. By connecting your Google Photos library to Personal Intelligence, Gemini goes a step further than just understanding your interests. It can use actual images of you and your loved ones to guide the image generation process.

    Since you can already organize and label groups of people and pets in your library, those labels provide the context that Gemini needs to make your images feel truly yours. Now your inner circle can become the stars of your images, whether you want a result that feels pulled straight from your life or one that takes your imagination a bit further.

    With those labels in place, you can simply ask Gemini to “create a claymation image of me and my family enjoying our favorite activity” and Gemini can generate that specific image for you automatically. You can also experiment with different styles like watercolors, charcoal sketches or oil paintings. You can turn a quick idea into a custom creation, saving you the trouble of searching for, downloading and re-uploading files just to see a concept come to life.

    Putting creative control in your hands

    Because this is a brand-new experience, Gemini might not always pick the exact photo or detail you had in mind on the first try. To keep you in the driver’s seat, we’ve built in ways to refine your results. If the result isn’t quite right, you can simply tell Gemini what was incorrect and try again. You can also click the ‘+’ icon and select a different reference photo from your Google Photos library to try a new perspective. If you’re ever curious about how your context was applied, click on the Sources button, and it’ll show you which image was auto-selected to guide the creation. You can even ask Gemini directly for information on the attribution and sources used for that specific image.

    Bringing personal details into your images shouldn't mean compromising on privacy, which is why our core commitments haven't changed. The Gemini app does not directly train its models on your private Google Photos library. We train on limited info, like specific prompts in Gemini and the model’s responses, to improve functionality over time. And connecting your Google apps to Gemini remains an opt-in experience that you can adjust in your settings at any time.

    This new personalized image creation experience in the Gemini app is rolling out over the next few days to eligible Google AI Plus, Pro and Ultra subscribers in the U.S., and we plan to bring this to Gemini in Chrome desktops and more users soon.

    Give it a try when it hits your app — we’re looking forward to seeing how these tools help you spend less time prompting and more time creating.

    Original source
  • Apr 16, 2026
    • Date parsed from source:
      Apr 16, 2026
    • First seen by Releasebot:
      Apr 17, 2026
    OpenAI logo

    OpenAI

    Introducing GPT‑Rosalind for life sciences research

    OpenAI introduces GPT-Rosalind, a research preview life sciences reasoning model for biology, drug discovery, and translational medicine, plus a new Codex research plugin that connects scientists to 50+ tools and data sources for faster research workflows.

    Today, we’re introducing GPT‑Rosalind, our frontier reasoning model built to support research across biology, drug discovery, and translational medicine. The life sciences model series is optimized for scientific workflows, combining improved tool use with deeper understanding across chemistry, protein engineering, and genomics.

    On average, it takes roughly 10 to 15 years to go from target discovery to regulatory approval for a new drug in the United States. Gains made at the earliest stages of discovery compound downstream in better target selection, stronger biological hypotheses and higher-quality experiments. Progress in the life sciences is constrained not only by the difficulty of the underlying science, but by the complexity of the research workflows themselves. Scientists must work across large volumes of literature, specialized databases, experimental data, and evolving hypotheses in order to generate and evaluate new ideas. These workflows are often time-intensive, fragmented, and difficult to scale.

    We believe advanced AI systems can help researchers move through these workflows faster—not just by making existing work more efficient, but by helping scientists explore more possibilities, surface connections that might otherwise be missed, and arrive at better hypotheses sooner. By supporting evidence synthesis, hypothesis generation, experimental planning, and other multi-step research tasks, this model is designed to help researchers accelerate the early stages of discovery. Over time, these systems could help life sciences organizations discover breakthroughs that wouldn’t otherwise be possible, with a much higher rate of success.

    GPT‑Rosalind is now available as a research preview in ChatGPT, Codex, and the API for qualified customers through our trusted access program. We’re also introducing a freely accessible Life Sciences research plugin for Codex, helping scientists connect models to over 50 scientific tools and data sources. We are working with customers like Amgen, Moderna, the Allen Institute, Thermo Fisher Scientific, and others to apply GPT‑Rosalind across workflows that accelerate research and discovery.

    The model is named after Rosalind Franklin, whose rigorous research helped reveal the structure of DNA and laid foundations for modern molecular biology.

    From raw data to grounded discovery decisions, see how our purpose-built model accelerates research workflows.

    Built for scientific workflows

    The GPT‑Rosalind life sciences model series is built for modern scientific work across published evidence, data, tools, and experiments. In our evaluations, it delivers the best performance on tasks that require reasoning over molecules, proteins, genes, pathways, and disease-relevant biology, and it is more effective at using scientific tools and databases in multi-step workflows such as literature review, sequence-to-function interpretation, experimental planning, and data analysis.

    This is the first release in our GPT‑Rosalind life sciences model series, and we will continue to expand the frontiers of the model’s biochemical reasoning capabilities across long-horizon, tool-heavy scientific workflows. OpenAI’s compute infrastructure gives us the ability to continue training, evaluating, and improving increasingly capable domain models against real scientific tasks—helping these systems become more useful as the workflows themselves become more complex.

    From evidence-based discovery insights to high-impact experiments, see how our suite of solutions translate into measurable improvements in your research workflows.

    Customers and ecosystem

    We are working with leading pharmaceutical, biotechnology, and research customers, as well as life sciences technology organizations, to apply GPT‑Rosalind across workflows that drive discovery.

    “The life sciences field demands precision at every step. The questions are highly complex, the data are highly unique, and the stakes are incredibly high. Our unique collaboration with OpenAI enables us to apply their most advanced capabilities and tools in new and innovative ways with the potential to accelerate how we deliver medicines to patients.”
    —Sean Bruich, Senior Vice President of Artificial Intelligence and Data, Amgen

    Performance and evaluation

    We evaluated GPT‑Rosalind across a range of capabilities fundamental to scientific discovery and industry research. These evaluations measure core reasoning across scientific subdomains, including chemical reaction mechanisms; protein structure, mutation effects, and interactions; and phylogenetic interpretation of DNA sequences. They also assess whether models can support real research workflows by interpreting experimental outputs, identifying expert-relevant patterns, and synthesizing external information to design follow-up experiments. Finally, they test whether models can select and use the right computational tools, databases, and domain-specific capabilities to augment their reasoning. Taken together, these evaluations show progress across the end-to-end process of scientific research and suggest a stronger ability to help researchers work through challenging discovery tasks.

    Prompt

    I am planning a base-promoted SNAr coupling of 1-(pyridin-3-yl)ethanol with 1-fluoro-2-nitrobenzene with the goal of synthesizing 1-(pyridin-3-yl)ethyl 2-nitrophenyl ether. I found several patents that describe room-temperature O-arylation of alcohols in DMF/Cs2CO3, but the reaction is taking longer than I would like. How can I improve this reaction? Help me find any relevant literature or patents as well.

    Industry evaluations

    We evaluated GPT‑Rosalind on a series of public benchmarks. On BixBench, a benchmark designed around real-world bioinformatics and data analysis, GPT‑Rosalind achieved leading performance among models with published scores.

    On LABBench2, a benchmark measuring performance on a range of research tasks such as literature retrieval, database access, sequence manipulation and protocol design, GPT‑Rosalind outperforms GPT‑5.4 on 6 out of 11 tasks. The most notable improvement comes from CloningQA, which requires end-to-end design of DNA and enzyme reagents for molecular cloning protocols.

    We also partnered with Dyno Therapeutics, a company pioneering AI-designed gene therapies, to evaluate the model on an RNA sequence-to-function prediction and generation task using unpublished, uncontaminated sequences. Performance was compared against 57 historical scores from human experts in the AI-bio field. When evaluated directly in the Codex app, best-of-ten model submissions ranked above the 95th percentile of human experts on the prediction task and around the 84th percentile of human experts on the sequence generation task.

    These evaluations provide a meaningful signal of performance on the kinds of workflows scientists rely on every day to generate evidence, analyze complex data, and move toward defensible biological conclusions.

    Connecting to the tools scientists use

    Scientists can use our new Life Sciences research plugin for Codex, available today in GitHub. This package includes a broad set of modular skills for most common research workflows, designed to help users work across human genetics, functional genomics, protein structure, biochemistry, clinical evidence, and public study discovery.

    These skills act as an orchestration layer that helps scientists work through broad, ambiguous, and multi-step questions more effectively. They provide access to more than 50 public multi-omics databases, literature sources, and biology tools, and offer a flexible starting point for common repeatable workflows such as protein structure lookup, sequence search, literature review, and public dataset discovery.

    Eligible Enterprise users can leverage this plugin in research workflows with GPT‑Rosalind for deeper biological reasoning, while all users can use the plugin package with our mainline models.

    Trusted access

    We want to make these capabilities available to the scientists and research organizations best positioned to advance human health, while maintaining strong safeguards against biological misuse. The Life Sciences model is launching through a trusted-access deployment structure for qualified Enterprise customers in the U.S. to start, with controls around eligibility, access management, and organizational governance. At the same time, we are making a set of connectors and the Life Sciences Research Plugin available more broadly, so researchers can use our mainline models more effectively for life sciences research tasks.

    The Life Sciences model was developed with heightened enterprise-grade security controls and strengthened access management, enabling professional scientific use in governed research environments. We evaluate access based on three core principles: beneficial use, strong governance and safety oversight, and controlled access with enterprise-grade security. In practice, this means participating organizations must be conducting legitimate scientific research with clear public benefit; maintain appropriate governance, compliance, and misuse-prevention controls; and restrict access to approved users within secure, well-managed environments. Organizations must also agree to the life sciences research preview terms and comply with OpenAI’s usage policies, and we may request additional information as part of onboarding or continued participation.

    Getting started

    Organizations can request access through our qualification and safety review process.

    During the research preview, use of this model will not consume existing credits or tokens—subject to abuse guardrails. We’ll share more details on pricing and availability as the program expands.

    The Life Sciences model is built to help scientific organizations do higher-quality work, faster, in environments that require both technical capability and operational control. Our dedicated Life Sciences team—as well as advisory partners including McKinsey & Company, Boston Consulting Group (BCG), and Bain & Company—help organizations identify high-impact use cases, integrate the model into enterprise environments, and drive measurable outcomes. If you’d like to explore ways OpenAI Life Sciences can support your work, you can contact our Life Sciences team.

    What’s next

    This is the first release in our Life Sciences model series, and we view it as the beginning of a long-term commitment to building AI that can accelerate scientific discovery in areas that matter deeply to society, from human health to broader biological research. We will continue improving the model’s biological reasoning, expanding support for tool-heavy and long-horizon research workflows, and working closely with leading scientific institutions to evaluate real-world impact. That includes ongoing partnerships with national laboratories such as Los Alamos National Laboratory, where we are exploring AI-guided protein and catalyst design, including the ability of AI systems to modify biological structures while preserving or improving key functional properties.

    Over time, we expect these systems to become increasingly capable partners in discovery—helping scientists move faster from question to evidence, from evidence to insight, and from insight to new treatments for patients.

    Original source
  • Apr 16, 2026
    • Date parsed from source:
      Apr 16, 2026
    • First seen by Releasebot:
      Apr 16, 2026
    Anthropic logo

    Anthropic

    Introducing Claude Opus 4.7

    Anthropic releases Claude Opus 4.7 as a generally available upgrade with stronger software engineering, better vision, sharper instruction following, and more reliable long-running agent work. It also adds new effort controls, task budgets, and Claude Code review tools.

    Our latest model, Claude Opus 4.7, is now generally available.

    Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence. Opus 4.7 handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and devises ways to verify its own outputs before reporting back.

    The model also has substantially better vision: it can see images in greater resolution. It’s more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs. And—although it is less broadly capable than our most powerful model, Claude Mythos Preview—it shows better results than Opus 4.6 across a range of benchmarks:

    Last week we announced Project Glasswing, highlighting the risks—and benefits—of AI models for cybersecurity. We stated that we would keep Claude Mythos Preview’s release limited and test new cyber safeguards on less capable models first. Opus 4.7 is the first such model: its cyber capabilities are not as advanced as those of Mythos Preview (indeed, during its training we experimented with efforts to differentially reduce these capabilities). We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses. What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models.

    Security professionals who wish to use Opus 4.7 for legitimate cybersecurity purposes (such as vulnerability research, penetration testing, and red-teaming) are invited to join our new Cyber Verification Program.

    Opus 4.7 is available today across all Claude products and our API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry. Pricing remains the same as Opus 4.6: $5 per million input tokens and $25 per million output tokens. Developers can use claude-opus-4-7 via the Claude API.

    Testing Claude Opus 4.7

    Claude Opus 4.7 has garnered strong feedback from our early-access testers:

    In early testing, we’re seeing the potential for a significant leap for our developers with Claude Opus 4.7. It catches its own logical faults during the planning phase and accelerates execution, far beyond previous Claude models. As a financial technology platform serving millions of consumers and businesses at significant scale, this combination of speed and precision could be game-changing: accelerating development velocity for faster delivery of the trusted financial solutions our customers rely on every day.

    Anthropic has already set the standard for coding models, and Claude Opus 4.7 pushes that further in a meaningful way as the state-of-the-art model on the market. In our internal evals, it stands out not just for raw capability, but for how well it handles real-world async workflows—automations, CI/CD, and long-running tasks. It also thinks more deeply about problems and brings a more opinionated perspective, rather than simply agreeing with the user.

    Claude Opus 4.7 is the strongest model Hex has evaluated. It correctly reports when data is missing instead of providing plausible-but-incorrect fallbacks, and it resists dissonant-data traps that even Opus 4.6 falls for. It’s a more intelligent, more efficient Opus 4.6: low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6.

    On our 93-task coding benchmark, Claude Opus 4.7 lifted resolution by 13% over Opus 4.6, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve. Combined with faster median latency and strict instruction following, it’s particularly meaningful for complex, long-running coding workflows. It cuts the friction from those multi-step tasks so developers can stay in the flow and focus on building.

    Based on our internal research-agent benchmark, Claude Opus 4.7 has the strongest efficiency baseline we’ve seen for multi-step work. It tied for the top overall score across our six modules at 0.715 and delivered the most consistent long-context performance of any model we tested. On General Finance—our largest module—it improved meaningfully on Opus 4.6, scoring 0.813 versus 0.767, while also showing the best disclosure and data discipline in the group. And on deductive logic, an area where Opus 4.6 struggled, Opus 4.7 is solid.

    Claude Opus 4.7 extends the limit of what models can do to investigate and get tasks done. Anthropic has clearly optimized for sustained reasoning over long runs, and it shows with market-leading performance. As engineers shift from working 1:1 with agents to managing them in parallel, this is exactly the kind of frontier capability that unlocks new workflows.

    We’re seeing major improvements in Claude Opus 4.7’s multimodal understanding, from reading chemical structures to interpreting complex technical diagrams. The higher resolution support is helping Solve Intelligence build best-in-class tools for life sciences patent workflows, from drafting and prosecution to infringement detection and invalidity charting.

    Claude Opus 4.7 takes long-horizon autonomy to a new level in Devin. It works coherently for hours, pushes through hard problems rather than giving up, and unlocks a class of deep investigation work we couldn't reliably run before.

    For Replit, Claude Opus 4.7 was an easy upgrade decision. For the work our users do every day, we observed it achieving the same quality at lower cost—more efficient and precise at tasks like analyzing logs and traces, finding bugs, and proposing fixes. Personally, I love how it pushes back during technical discussions to help me make better decisions. It really feels like a better coworker.

    Claude Opus 4.7 demonstrates strong substantive accuracy on BigLaw Bench for Harvey, scoring 90.9% at high effort with better reasoning calibration on review tables and noticeably smarter handling of ambiguous document editing tasks. It correctly distinguishes assignment provisions from change-of-control provisions, a task that has historically challenged frontier models. Substance was consistently rated as a strength across our evaluations: correct, thorough, and well-cited.

    Claude Opus 4.7 is a very impressive coding model, particularly for its autonomy and more creative reasoning. On CursorBench, Opus 4.7 is a meaningful jump in capabilities, clearing 70% versus Opus 4.6 at 58%.

    For complex multi-step workflows, Claude Opus 4.7 is a clear step up: plus 14% over Opus 4.6 at fewer tokens and a third of the tool errors. It’s the first model to pass our implicit-need tests, and it keeps executing through tool failures that used to stop Opus cold. This is the reliability jump that makes Notion Agent feel like a true teammate.

    In our evals, we saw a double-digit jump in accuracy of tool calls and planning in our core orchestrator agents. As users leverage Hebbia to plan and execute on use cases like retrieval, slide creation, or document generation, Claude Opus 4.7 shows the potential to improve agent decision-making in these workflows.

    On Rakuten-SWE-Bench, Claude Opus 4.7 resolves 3x more production tasks than Opus 4.6, with double-digit gains in Code Quality and Test Quality. This is a meaningful lift and a clear upgrade for the engineering work our teams are shipping every day.

    For CodeRabbit’s code review workloads, Claude Opus 4.7 is the sharpest model we’ve tested. Recall improved by over 10%, surfacing some of the most difficult-to-detect bugs in our most complex PRs, while precision remained stable despite the increased coverage. It’s a bit faster than GPT-5.4 xhigh on our harness, and we’re lining it up for our heaviest review work at launch.

    For Genspark’s Super Agent, Claude Opus 4.7 nails the three production differentiators that matter most: loop resistance, consistency, and graceful error recovery. Loop resistance is the most critical. A model that loops indefinitely on 1 in 18 queries wastes compute and blocks users. Lower variance means fewer surprises in prod. And Opus 4.7 achieves the highest quality-per-tool-call ratio we’ve measured.

    Claude Opus 4.7 is a meaningful step up for Warp. Opus 4.6 is one of the best models out there for developers, and this model is measurably more thorough on top of that. It passed Terminal Bench tasks that prior Claude models had failed, and worked through a tricky concurrency bug Opus 4.6 couldn't crack. For us, that’s the signal.

    Claude Opus 4.7 is the best model in the world for building dashboards and data-rich interfaces. The design taste is genuinely surprising—it makes choices I’d actually ship. It’s my default daily driver now.

    Claude Opus 4.7 is the most capable model we've tested at Quantium. Evaluated against leading AI models through our proprietary benchmarking solution, the biggest gains showed up where they matter most: reasoning depth, structured problem-framing, and complex technical work. Fewer corrections, faster iterations, and stronger outputs to solve the hardest problems our clients bring us.

    Claude Opus 4.7 feels like a real step up in intelligence. Code quality is noticeably improved, it’s cutting out the meaningless wrapper functions and fallback scaffolding that used to pile up, and fixes its own code as it goes. It’s the cleanest jump we’ve seen since the move from Sonnet 3.7 to the Claude 4 series.

    For the computer-use work that sits at the heart of XBOW’s autonomous penetration testing, the new Claude Opus 4.7 is a step change: 98.5% on our visual-acuity benchmark versus 54.5% for Opus 4.6. Our single biggest Opus pain point effectively disappeared, and that unlocks its use for a whole class of work where we couldn’t use it before.

    Claude Opus 4.7 is a solid upgrade with no regressions for Vercel. It’s phenomenal on one-shot coding tasks, more correct and complete than Opus 4.6, and noticeably more honest about its own limits. It even does proofs on systems code before starting work, which is new behavior we haven’t seen from earlier Claude models.

    Claude Opus 4.7 is very strong and outperforms Opus 4.6 with a 10% to 15% lift in task success for Factory Droids, with fewer tool errors and more reliable follow-through on validation steps. It carries work all the way through instead of stopping halfway, which is exactly what enterprise engineering teams need.

    Claude Opus 4.7 autonomously built a complete Rust text-to-speech engine from scratch—neural model, SIMD kernels, browser demo—then fed its own output through a speech recognizer to verify it matched the Python reference. Months of senior engineering, delivered autonomously. The step up from Opus 4.6 is clear, and the codebase is public.

    Claude Opus 4.7 passed three TBench tasks that prior Claude models couldn’t, and it’s landing fixes our previous best model missed, including a race condition. It demonstrates strong precision in identifying real issues, and surfaces important findings that other models either gave up on or didn’t resolve. In Qodo’s real-world code review benchmark, we observed top-tier precision.

    On Databricks’ OfficeQA Pro, Claude Opus 4.7 shows meaningfully stronger document reasoning, with 21% fewer errors than Opus 4.6 when working with source information. Across our agentic reasoning over data benchmarks, it is the best-performing Claude model for enterprise document analysis.

    For Ramp, Claude Opus 4.7 stands out in agent-team workflows. We’re seeing stronger role fidelity, instruction-following, coordination, and complex reasoning, especially on engineering tasks that span tools, codebases, and debugging context. Compared with Opus 4.6, it needs much less step-by-step guidance, helping us scale the internal agent workflows our engineering teams run.

    Claude Opus 4.7 is measurably better than Opus 4.6 for Bolt’s longer-running app-building work, up to 10% better in the best cases, without the regressions we’ve come to expect from very agentic models. It pushes the ceiling on what our users can ship in a single session.

    Below are some highlights and notes from our early testing of Opus 4.7:

    Instruction following.

    Opus 4.7 is substantially better at following instructions. Interestingly, this means that prompts written for earlier models can sometimes now produce unexpected results: where previous models interpreted instructions loosely or skipped parts entirely, Opus 4.7 takes the instructions literally. Users should re-tune their prompts and harnesses accordingly.

    Improved multimodal support.

    Opus 4.7 has better vision for high-resolution images: it can accept images up to 2,576 pixels on the long edge (~3.75 megapixels), more than three times as many as prior Claude models. This opens up a wealth of multimodal uses that depend on fine visual detail: computer-use agents reading dense screenshots, data extractions from complex diagrams, and work that needs pixel-perfect references.

    Real-world work.

    As well as its state-of-the-art score on the Finance Agent evaluation (see table above), our internal testing showed Opus 4.7 to be a more effective finance analyst than Opus 4.6, producing rigorous analyses and models, more professional presentations, and tighter integration across tasks. Opus 4.7 is also state-of-the-art on GDPval-AA, a third-party evaluation of economically valuable knowledge work across finance, legal, and other domains.

    Memory.

    Opus 4.7 is better at using file system-based memory. It remembers important notes across long, multi-session work, and uses them to move on to new tasks that, as a result, need less up-front context.

    Overall, Opus 4.7 shows a similar safety profile to Opus 4.6: our evaluations show low rates of concerning behavior such as deception, sycophancy, and cooperation with misuse. On some measures, such as honesty and resistance to malicious “prompt injection” attacks, Opus 4.7 is an improvement on Opus 4.6; in others (such as its tendency to give overly detailed harm-reduction advice on controlled substances), Opus 4.7 is modestly weaker. Our alignment assessment concluded that the model is “largely well-aligned and trustworthy, though not fully ideal in its behavior”. Note that Mythos Preview remains the best-aligned model we’ve trained according to our evaluations. Our safety evaluations are discussed in full in the Claude Opus 4.7 System Card.

    In addition to Claude Opus 4.7 itself, we’re launching the following updates:

    More effort control:

    Opus 4.7 introduces a new xhigh (“extra high”) effort level between high and max, giving users finer control over the tradeoff between reasoning and latency on hard problems. In Claude Code, we’ve raised the default effort level to xhigh for all plans. When testing Opus 4.7 for coding and agentic use cases, we recommend starting with high or xhigh effort.

    On the Claude Platform (API):

    as well as support for higher-resolution images, we’re also launching task budgets in public beta, giving developers a way to guide Claude’s token spend so it can prioritize work across longer runs.

    In Claude Code:

    The new /ultrareview slash command produces a dedicated review session that reads through changes and flags bugs and design issues that a careful reviewer would catch. We’re giving Pro and Max Claude Code users three free ultrareviews to try it out. In addition, we’ve extended auto mode to Max users. Auto mode is a new permissions option where Claude makes decisions on your behalf, meaning that you can run longer tasks with fewer interruptions—and with less risk than if you had chosen to skip all permissions.

    Opus 4.7 is a direct upgrade to Opus 4.6, but two changes are worth planning for because they affect token usage. First, Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type. Second, Opus 4.7 thinks more at higher effort levels, particularly on later turns in agentic settings. This improves its reliability on hard problems, but it does mean it produces more output tokens.

    Users can control token usage in various ways: by using the effort parameter, adjusting their task budgets, or prompting the model to be more concise. In our own testing, the net effect is favorable—token usage across all effort levels is improved on an internal coding evaluation, as shown below—but we recommend measuring the difference on real traffic. We’ve written a migration guide that provides further advice on upgrading from Opus 4.6 to Opus 4.7.

    Original source