Cursor Release Notes

Last updated: Jan 17, 2026

  • Jan 16, 2026
    • Parsed from source:
      Jan 16, 2026
    • Detected by Releasebot:
      Jan 17, 2026
    Cursor logo

    Cursor

    CLI

    Cursor CLI gains editor vibes with Plan mode to design before code and Ask mode to explore without changes. Handoff to Cloud Agents and word-level inline diffs boost collaboration, plus MCP authentication and a refreshed interactive menu.

    Plan mode in CLI

    Use Plan mode to design your approach before coding. Cursor will ask clarifying questions to refine your plan. Get started with /plan or --mode=plan.

    Ask mode in CLI

    Use Ask mode to explore code without making changes, just like in the editor. Start asking questions with /ask or --mode=ask.

    Handoff to Cloud Agents

    Push your local conversation to a Cloud Agent and let it keep running while you're away. Prepend & to any message to send it to the cloud, then pick it back up on web or mobile at cursor.com/agents.

    Word-level Inline Diffs

    Show exactly what changed with precise word-level highlighting in the CLI.

    One-click MCP authentication

    Connect Cursor to external tools and data sources with a new login flow supporting automatic callback handling. The agent gets access to authenticated MCPs immediately.
    Use /mcp list for an updated interactive MCP menu to browse, enable, and configure MCP servers at a glance.

    Original source Report a problem
  • Jan 15, 2026
    • Parsed from source:
      Jan 15, 2026
    • Detected by Releasebot:
      Jan 16, 2026
    Cursor logo

    Cursor

    Building a better Bugbot

    Bugbot is a shipped code review assistant that analyzes PRs for logic bugs, performance issues, and security, now with an Autofix Beta. It has driven resolution rate up from 52% to over 70% and reduces false positives across teams.

    As coding agents became more capable, we found ourselves spending more time on review. To solve this, we built Bugbot, a code review agent that analyzes pull requests for logic bugs, performance issues, and security vulnerabilities before they reach production. By last summer, it was working so well that we decided to release it to users.

    The process of building Bugbot began with qualitative assessments and gradually evolved into a more systematic approach, using a custom AI-driven metric to hill-climb on quality.

    Since launch, we have run 40 major experiments that have increased Bugbot's resolution rate from 52% to over 70%, while lifting the average number of bugs flagged per run from 0.4 to 0.7. This means that the number of resolved bugs per PR has more than doubled, from roughly 0.2 to about 0.5.

    We released Version 1 in July 2025 and Version 11 in January 2026. Newer versions caught more bugs without a comparable rise in false positives.

    Humble beginnings

    When we first tried to build a code review agent, the models weren't capable enough for the reviews to be helpful. But as the baseline models improved, we realized we had a number of ways to increase the quality of bug reporting.

    We experimented with different configurations of models, pipelines, filters, and clever context management strategies, polling engineers internally along the way. If it seemed one configuration had fewer false positives, we adopted it.

    One of the most effective quality improvements we found early on was running multiple bug-finding passes in parallel and combining their results with majority voting. Each pass received a different ordering of the diff, which nudged the model toward different lines of reasoning. When several passes independently flagged the same issue, we treated it as a stronger signal that the bug was real.

    After weeks of internal qualitative iterations, we landed on a version of Bugbot that outperformed other code review tools on the market and gave us confidence to launch. It used this flow:

    • Run eight parallel passes with randomized diff order
    • Combine similar bugs into one bucket
    • Majority voting to filter out bugs found during only one pass
    • Merge each bucket into a single clear description
    • Filter out unwanted categories (like compiler warnings or documentation errors)
    • Run results through a validator model to catch false positives
    • Dedupe against bugs posted from previous runs

    From prototype to production

    To make Bugbot usable in practice, we had to invest in a set of foundational systems alongside the core review logic. That included making repository access fast and reliable by rebuilding our Git integration in Rust and minimizing how much data we fetched, as well as adding rate-limit monitoring, request batching, and proxy-based infrastructure to operate within GitHub's constraints.

    As adoption grew, teams also needed a way to encode codebase-specific invariants like unsafe migrations or incorrect use of internal APIs. In response, we added Bugbot rules to support those checks without hardcoding them into the system.

    Together, these pieces made Bugbot practical to run and adaptable to real codebases. But they didn't tell us whether quality was actually improving. Without a metric to measure progress, we couldn't quantitatively assess Bugbot's performance in the wild, and that put a ceiling on how far we could push it.

    Measuring what matters

    To solve this problem, we devised a metric called the resolution rate. It uses AI to determine, at PR merge time, which bugs were actually resolved by the author in the final code. When developing this metric, we spot-checked every example internally with the PR author and we found that the LLM correctly classified nearly all of them as resolved or not.

    Teams often ask us how to assess the impact Bugbot is having for them, so we surface this metric prominently in the dashboard. For teams evaluating effectiveness, it's a much clearer signal than anecdotal feedback or reactions on comments. Resolution rate directly answers whether Bugbot is finding real issues that engineers fix.

    The Bugbot dashboard showing a team's resolution rate over time and other key metrics.

    Hill-climbing

    Defining resolution rate changed how we built Bugbot. For the first time, we could hill-climb on the basis of real signal, rather than just feel. We began evaluating changes online using actual resolution rates and offline using BugBench, a curated benchmark of real code diffs with human annotated bugs.

    We ran dozens of experiments across models, prompts, iteration counts, validators, context management, category filtering, and agentic designs. Many changes, surprisingly, regressed our metrics. It turned out that a lot of our initial judgments from the early qualitative analyses were correct.

    Agentic architecture

    We saw the largest gains when, this fall, we switched Bugbot to a fully agentic design. The agent could reason over the diff, call tools, and decide where to dig deeper instead of following a fixed sequence of passes.

    The agentic loop forced us to rethink prompting. With earlier versions of Bugbot we needed to restrain the models to minimize false positives. But with the agentic approach we encountered the opposite problem: it was too cautious. We shifted to aggressive prompts that encouraged the agent to investigate every suspicious pattern and err on the side of flagging potential issues.

    In addition, the agentic architecture opened up a richer surface for experimentation. We were able to shift more information out of static context and into dynamic context, varying how much upfront context the model received and observing how it adapted. The model consistently pulled in the additional context it needed at runtime, without requiring everything to be provided ahead of time.

    The same setup lets us iterate directly on the toolset itself. Because the model's behavior is shaped by the tools it can call, even small changes in tool design or availability had an outsized impact on outcomes. Through multiple rounds of iteration, we adjusted and refined that interface until the model's behavior consistently aligned with our expectations.

    What's next

    Today, Bugbot reviews more than two million PRs per month for customers like Rippling, Discord, Samsara, Airtable, and Sierra AI. We also run Bugbot on all internal code at Cursor.

    Looking forward, we expect new models to arrive on a regular basis with different strengths and weaknesses, both from other providers and from our own training efforts. Continued progress requires finding the right combination of models, harness design, and review structure. Bugbot today is multiples better than Bugbot at launch. In a few months we expect it will be significantly better again.

    We're already building toward that future. We just launched Bugbot Autofix in Beta, which automatically spawns a Cloud Agent to fix bugs found during PR reviews. The next major capabilities include letting Bugbot run code to verify its own bug reports and enabling deep research when it encounters complex issues. We're also experimenting with an always-on version that continuously scans your codebase rather than waiting for pull requests.

    We've made great strides so far that would not be possible without the contributions of some key teammates including Lee Danilek, Vincent Marti, Rohan Varma, Yuri Volkov, Jack Pertschuk, Michiel De Jong, Federico Cassano, Ravi Rahman, and Josh Ma. Together, our goal continues to be to help your teams maintain code quality as your AI development workflows scale.

    Read the docs or try Bugbot today.

    Original source Report a problem
  • Jan 8, 2026
    • Parsed from source:
      Jan 8, 2026
    • Detected by Releasebot:
      Jan 9, 2026
    • Modified by Releasebot:
      Jan 17, 2026
    Cursor logo

    Cursor

    CLI

    Release notes

    This release introduces new CLI controls for models, MCP management, rules and commands, alongside major hooks performance improvements and bug fixes.

    Model list and selection

    Use the new agent models command, --list-models flag, or /models slash command to list all available models and quickly switch between them.

    Rules generation and management

    Create new rules and edit existing ones directly from the CLI with the /rules command.

    Enabling MCP servers

    Enable and disable MCP servers on the fly with /mcp enable and /mcp disable commands.

    Original source Report a problem
  • Dec 22, 2025
    • Parsed from source:
      Dec 22, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Cursor logo

    Cursor

    Hooks for security and platform teams

    Cursor unveils hooks to observe, block, and extend the agent loop, enabling enterprise control and security workflows. A partner ecosystem adds MCP governance, code security, dependency checks, agent safety, and secrets management. Enterprise onboarding available.

    Earlier this year, we released hooks for organizations to observe, control, and extend Cursor's agent loop using custom scripts. Hooks run before or after defined stages of the agent loop and can observe, block, or modify behavior.

    We've seen many of our customers use hooks to connect Cursor to their security tooling, observability platforms, secrets managers, and internal compliance systems.

    To make it easier to get started, we're partnering with ecosystem vendors who have built hooks support with Cursor.

    Hooks partners

    Our partners cover MCP governance, code security, dependency scanning, agent safety, and secrets management.

    MCP governance and visibility

    MintMCP uses beforeMCPExecution and afterMCPExecution hooks to build a complete inventory of MCP servers, monitor tool usage patterns, and scan responses for sensitive data before it reaches the AI model.

    Oasis Security extends their Agentic Access Management platform to Cursor, using hooks to enforce least-privilege policies on AI agent actions and maintain full audit trails across enterprise systems.

    Runlayer uses hooks to wrap MCP tools and integrate with their MCP broker, giving organizations centralized control and visibility over all agent-to-tool interactions.

    Code security and best practices

    Corridor provides real-time feedback to the agent on code implementation and security design decisions as code is being written.

    Semgrep automatically scans AI-generated code for vulnerabilities using hooks, giving the agent real-time feedback to regenerate code until security issues are resolved.

    Dependency security

    Endor Labs uses hooks to intercept package installations and scan for malicious dependencies, preventing supply chain attacks like typosquatting and dependency confusion before they enter your codebase.

    Agent security and safety

    Snyk integrates Evo Agent Guard with hooks to review agent actions in real-time, detecting and preventing issues like prompt injection and dangerous tool calls.

    Secrets management

    • 1Password uses hooks to validate that all required environment files from 1Password Environments are properly mounted before shell commands execute, enabling just-in-time secrets access without writing credentials to disk.

    To deploy Cursor with enterprise features and priority support, talk to our team.

    Original source Report a problem
  • Dec 22, 2025
    • Parsed from source:
      Dec 22, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    Cursor logo

    Cursor

    2.3

    Holiday release brings stability and bug fixes across the core agent and layout controls. Users get four default layouts—agent, editor, zen, browser—with a quick switch shortcut and macOS style navigation for faster workspace changes.

    Stability improvements

    For this holiday release, we've focused entirely on fixing bugs and improving stability.
    This includes the core agent, layout controls, viewing code diffs, and more. We will be slowly rolling these updates out over the week, ensuring there are no regressions during your holiday coding.

    Layout customization

    It's now easier to customize your default layout across workspaces.
    We've included four default layouts: agent, editor, zen, and browser. You can use Command (⌘) + Option (⌥) + Tab (⇥) to switch between layouts, or easily jump between different workspaces. Additionally, you can move backwards in this list by including Shift (⇧), similar to macOS.

    Original source Report a problem
  • Dec 18, 2025
    • Parsed from source:
      Dec 18, 2025
    • Detected by Releasebot:
      Dec 23, 2025
    • Modified by Releasebot:
      Jan 9, 2026
    Cursor logo

    Cursor

    Dec 18, 2025

    Cursor unveils Enterprise features like conversation insights with work categorization and shareable transcripts for PRs. Billing groups, Linux sandboxing, and service accounts boost security, governance, and automated workflows.

    Enterprise Insights, Billing Groups, Service Accounts, and Improved Security Controls

    Many of the largest software companies in the world have adopted Cursor for Enterprise. Here are some of the new features we're releasing today:

    Conversation insights

    Cursor can now analyze the code and context in each agent session to understand the type of work that is being done, including:

    • Category: Bug fixes, refactoring, explanation
    • Work Type: Maintenance, bug fixing, new features
    • Complexity: Difficulty and specificity of prompts

    Enterprise customers can also extend these categories across their organization and teams. We protect your privacy by ensuring no PII or sensitive data is collected as part of these insights.

    Shared agent transcripts

    You can now share agent conversations with your team.
    Generate a read-only transcript of any agent conversation to include in your PRs or internal documentation. Transcripts can be forked so others can start new agent conversations from the same context.

    Billing groups

    Cursor now supports billing groups for fine-grained visibility into where usage occurs.
    Map usage and spend to the structure of your organization. Track spend by group, set budget alerts, and keep an eye on outliers. Understand which teams have the highest adoption of Cursor.

    Linux sandboxing for agents

    Sandboxing for agents supports Linux in addition to macOS.
    This allows agents to work effectively within appropriate boundaries. Access is scoped to your workspace and can be configured to block unauthorized network and filesystem access.
    Learn more about LLM safety and controls.

    Service accounts

    Service accounts are non-human accounts (and their API keys) that can configure Cursor, call APIs, and invoke cloud agents.
    With service accounts, teams can securely automate Cursor-powered workflows without tying integrations to individual developers' accounts. This makes it easier to manage access, rotate credentials, and keep automations running even as people and roles change.
    Service accounts will roll out to Enterprise accounts starting the week of 12/22.
    Learn more about Cursor for Enterprise and talk to our team to learn more.

    Original source Report a problem
  • Dec 11, 2025
    • Parsed from source:
      Dec 11, 2025
    • Detected by Releasebot:
      Dec 12, 2025
    Cursor logo

    Cursor

    A visual editor for the Cursor Browser

    Cursor Browser introduces a visual editor that blends design and code in one window. Drag and drop layout, live prop editing, and describe-and-prompt workflows let you adjust UI and apply changes instantly.

    We're excited to release a visual editor for the Cursor Browser. It brings together your web app, codebase, and powerful visual editing tools, all in the same window. You can drag elements around, inspect components and props directly, and describe changes while pointing and clicking. Now, interfaces in your product are more immediate and intuitive, closing the gap between design and code.

    Rearrange with drag-and-drop

    The visual editor lets you manipulate a site's layout and structure directly by dragging and dropping rendered elements across the DOM tree.

    This unifies visual design with coding. You can swap the order of buttons, rotate sections, and test different grid configurations without ever context-switching. Once the visual design matches what you had in mind, tell the agent to apply it. The agent will locate the relevant components and update the underlying code for you.

    Test component states directly

    Many modern apps are built in React, where components have properties to control different component states. The visual editor makes it easy to surface these props in the sidebar so you can make changes across different variants of a component.

    Adjust properties with visual controls

    The visual editor sidebar lets you fine-tune styles with sliders, palettes, and your own color tokens and design system. Every tweak is fully interactive: live color pickers that preview your choices, as well as controls to rearrange grids, flexbox layouts, and typography with precision.

    Point and prompt

    The visual editor also lets you click on anything in your interface and describe what you want to change. You could click on one element and say, "make this bigger," while on another you prompt, "turn this red," and on a third you type, "swap their order." The agents run in parallel, and within seconds your changes are live.

    Up the abstraction hierarchy

    Cursor's new visual editor unifies your work across design and code, helping you better articulate what you want so that execution isn't limited by mechanics.

    We see a future where agents are even more deeply connected to building apps on the web, and humans express their ideas through interfaces that connect thought to code more directly. These features are a step in that direction.

    Read the Browser docs. Learn about all the new features in Cursor 2.2.

    Original source Report a problem
  • Dec 10, 2025
    • Parsed from source:
      Dec 10, 2025
    • Detected by Releasebot:
      Dec 11, 2025
    • Modified by Releasebot:
      Dec 12, 2025
    Cursor logo

    Cursor

    2.2

    New debugging and design features roll out: Debug Mode adds runtime logs across stacks for root-cause analysis. Browser layout and style editor lets you design in real time with agent-driven updates. Plan Mode adds Mermaid diagrams, parallel agent reviews, and pinned chats for quick reference.

    Debug Mode

    Debug Mode helps you reproduce and fix the most tricky bugs.
    Cursor instruments your app with runtime logs to find the root cause. It works across stacks, languages, and models.
    Read more in our announcement.

    Browser layout and style editor

    Design and code simultaneously with a brand new browser sidebar and component tree.
    Move elements, update colors, test layouts, and experiment with CSS in real time, then instantly apply changes to your codebase using agent. You can also click on multiple elements and describe changes in text to kick off an agent to make visual changes.
    Read more in our announcement.

    Plan Mode and Parallel Agents

    Plan Mode now supports inline Mermaid diagrams, allowing the agent to automatically generate and stream visuals into your plans. You also have more control over how you build them, with the option to send selected to-dos to new agents.

    When running multiple agents in parallel, Cursor will now automatically evaluate all runs and give a recommendation for the best solution.
    The selected agent will have a comment explaining why it was picked. Judging of the best solution only happens after all parallel agents have finished.

    In the agent sidebar

    In the agent sidebar, pin chats at the top for future reference.

    Original source Report a problem
  • Dec 10, 2025
    • Parsed from source:
      Dec 10, 2025
    • Detected by Releasebot:
      Dec 11, 2025
    Cursor logo

    Cursor

    Introducing Debug Mode: Agents with runtime logs

    Cursor 2.2 adds Debug Mode a human in the loop debugging workflow that builds hypotheses instruments runtime logs reproduces bugs and verifies fixes with you This turns tough issues into targeted changes and clean ship ready code

    Describe the bug

    Coding agents are great at lots of things, but some bugs consistently stump them. That's why we're introducing Debug Mode, an entirely new agent loop built around runtime information and human verification.

    To build it, we examined the practices of the best debuggers on our team. We rolled their workflows into an agent mode, equipping it with tools to instrument code with runtime logs, prompts that generate multiple hypotheses about what's going wrong, and the ability to call back to you to reproduce the issue and verify fixes.

    The result is an interactive process that reliably fixes bugs that were previously beyond the reach of even the smartest models working alone, or could take significant developer time to address.

    Reproduce the bug

    Describe the bug

    To get started, select Debug Mode from the dropdown menu and describe the bug in as much detail as you can.

    Instead of immediately trying to generate a fix, the agent reads through your codebase and generates multiple hypotheses about what could be wrong. Some will be ideas you would have thought of on your own, but others will likely be approaches you wouldn't have considered.

    The agent then instruments your code with logging statements designed to test these hypotheses. This prepares the agent to receive concrete data about what's actually happening when the bug occurs.

    Verify the fix

    Reproduce the bug

    Next, go to your application and reproduce the bug while the agent collects the runtime logs.

    The agent can see exactly what's happening in your code when the bug occurs: variable states, execution paths, timing information. With this data, it can pinpoint the root cause and generate a targeted fix. Often that's a precise two or three line modification instead of the hundreds of lines of speculative code you'd have received with a standard agent interaction.

    Verify the fix

    At this point, Debug Mode asks you to reproduce the bug one more time with the proposed fix in place. If the bug is gone, you mark it as fixed and the agent removes all the instrumentation, leaving you with a clean, minimal change you can ship.

    This human-in-the-loop verification is critical. Sometimes bugs are obvious, but other times they fall into a gray area where the fix might work technically but not feel right. The agent can't make that call on its own. If you don't think the bug is fixed, the agent adds more logging, you reproduce again, and it refines its approach until the problem is actually solved.

    This kind of tight back-and-forth is one way we think AI coding works best. The agent handles the tedious work while you make the quick decisions that need human judgment. The result with Debug Mode is that tricky bugs that used to be out of reach are now reliably fixed.

    Read the Debug Mode docs. Learn about all the new features in Cursor 2.2.

    Original source Report a problem
  • Dec 4, 2025
    • Parsed from source:
      Dec 4, 2025
    • Detected by Releasebot:
      Dec 5, 2025
    Cursor logo

    Cursor

    Improving Cursor’s agent for OpenAI Codex models

    Cursor updates its agent harness to support OpenAI Codex frontier model GPT-5.1-Codex-Max with improved tool calls and shell‑oriented workflows. The update preserves reasoning traces, enhances lint checks, and tightens safety so code edits are faster and more reliable.

    Building a robust agent harness

    Cursor integrates with all frontier AI models for coding.
    Each model requires specific instructions and tweaks to our agent harness to improve output quality, prevent laziness, efficiently call tools, and more.
    We’ve been partnering with OpenAI to make their models available to developers with Cursor’s agent. This post will cover how we’ve updated our agent harness to support their latest frontier coding model GPT-5.1-Codex-Max.

    Every model in Cursor’s agent harness has specific instructions and tools made available to optimize that model inside the Cursor environment.
    AI labs train new models on a variety of different instructions and tools. In specific domains like coding, models may favor patterns that are more similar to what they’ve already seen in training. When adding new models into Cursor, our job is to integrate familiar instructions and tools alongside Cursor-specific ones, and then tune them based on Cursor Bench, our internal suite of evals.
    We measure the quality and robustness of models based on their success rate, ability to call tools, and overall adoption across users. Here are some of the updates we made to our agent harness for Codex.

    Updating for the latest Codex model

    OpenAI’s Codex models are versions of their latest frontier model, trained specifically for agentic coding.
    The OpenAI team collaborated closely with us to align the tools and prompts with the Codex CLI harness. Here are some of the changes we’ve made:

    A more shell-forward approach

    OpenAI’s Codex CLI is focused on shell-oriented workflows. As a result, the Codex model receives a limited set of tools during training and learns instead to use the shell to search, read files, and make edits.
    If the model is struggling with a difficult edit, it sometimes falls back to writing files using an inline Python script. These scripts are powerful, but tool calling is both safer and a better user experience for edits in Cursor.
    To encourage tool calling, we made the names and definitions of tools in Cursor closer to their shell equivalents like rg (ripgrep). We made this change for all models in our harness. We also added instructions like:
    If a tool exists for an action, prefer to use the tool
    instead of shell commands (e.g. read_file over cat).
    Sandboxing in Cursor, which prevents unauthorized file access and network activity without requiring users to manually approve every command, also helps improve security here if the model does still choose to run a shell command.

    Preambles

    Unlike the mainline GPT-5 series of models, the Codex model family currently uses reasoning summaries to communicate user updates as it’s working. These can be in the form of one-line headings or a full message.
    For these reasoning summaries, we wanted to strike a balance that would let users follow along with the agent’s progress and identify bad trajectories early, without spamming them to the point that they tune out. We gave the model guidelines to limit reasoning summaries to 1 or 2 sentences, note when discovering new information or initiating a new tactic, and to avoid commenting on its own communication (“I’m explaining to the user…”).
    Since Codex models cannot “talk” normally until the end of an agent turn, we removed all language in the prompt related to communicating with the user mid-turn. We found that this improved the performance of the model’s final code output.

    Reading lints

    Cursor makes tools available to all models in our harness for reading linter errors (e.g. ESLint, Biome) and allowing the agent to automatically fix them.
    We found that providing Codex with the tool definition alone is not enough to make it inclined to call our read_lints tool. Instead, Codex benefits significantly from clear and literal instructions for when to call it:
    After substantive edits, use the read_lints tool to check
    recently edited files for linter errors. If you've introduced
    any, fix them if you can easily figure out how.

    Preserving reasoning traces

    OpenAI’s reasoning models emit internal reasoning traces between tool calls, which are effectively a “chain of thought” explaining why the model chooses each action. The Responses API is designed to capture and pass along these reasoning items (or encrypted reasoning items in sensitive contexts) so the model can maintain continuity across turns rather than having to reconstruct its plan from scratch.
    Codex is especially dependent on this continuity. When reasoning traces are dropped, the model has to infer its previous thought process, which often leads to lost subgoals, degraded planning, misordered tool calls, or repeatedly re-deriving earlier steps. In our Cursor Bench experiments, removing reasoning traces from GPT-5-Codex caused a 30% performance drop. In comparison, OpenAI observed a smaller 3% degradation for GPT-5 on SWE-bench when reasoning traces were omitted.
    Given the scale of that impact, we added alerting to ensure that reasoning traces are always preserved and forwarded correctly. This keeps the agent’s internal plan intact and prevent the performance regressions that occur when models are forced to “fill in the blanks” between tool calls.

    Biasing the model to take action

    In Cursor's default agent mode, you want the agent to autonomously read and edit files based on the user request. It can be frustrating when you tab away only to find that the agent was waiting to ask for your permission to proceed.
    We’ve been experimenting with more specific instructions to help guide Codex:
    Unless the user explicitly asks for a plan or some other intent that
    makes it clear that code should not be written, assume the user wants
    you to make code changes or run tools to solve the user's problem. In
    these cases, it's bad to output your proposed solution in a message, you
    should go ahead and actually implement the change. If you encounter
    challenges or blockers, you should attempt to resolve them yourself.
    In Cloud Agents, our async remote workflow, we make this language even stronger.

    Message ordering

    OpenAI models are trained to respect and prioritize message ordering. For example, the system prompt always takes precedence over user messages and tool results.
    While this is helpful, it means we need to tune our harnesses to ensure the Cursor-provided prompt does not include instructions which could accidentally contradict user messages. Otherwise, Codex could get into a state where it does not want to comply with the user request.
    For example, at one point we told Codex that it should take care to preserve tokens and not be wasteful. But we noticed that this message was impacting the model’s willingness to perform more ambitious tasks or large-scale explorations. Sometimes it would stop and stubbornly say, I’m not supposed to waste tokens, and I don’t think it’s worth continuing with this task!

    Looking forward

    The pace of model releases is increasing. Our goal is to get the most out of every frontier model inside the Cursor agent harness. There’s more work to be done, and we’ll continue to share improvements we’re making to Cursor.

    Original source Report a problem

Related vendors