Cursor Release Notes
Last updated: Apr 3, 2026
- Apr 2, 2026
- Date parsed from source:Apr 2, 2026
- First seen by Releasebot:Apr 3, 2026
3.0
Cursor 3 now brings the new Agents Window, letting users run many agents in parallel across local, cloud, worktree, and remote SSH environments. It also adds Design Mode for more precise browser feedback, plus Agent Tabs in the editor and other improvements and bug fixes.
Cursor 3 is now available.
Agents Window
The new Cursor interface allows you to run many agents in parallel across repos and environments: locally, in worktrees, in the cloud, and on remote SSH.
It's simpler, more powerful, and centered around agents, while keeping the depth of a development environment.To try the Agents Window, upgrade Cursor and type Cmd+Shift+P -> Agents Window.
You can switch back to the IDE anytime, or have both open simultaneously.
Read more in our announcement.Design Mode
In the Agents Window, you can use Design Mode to annotate and target UI elements directly in the browser.
This allows you to give more precise feedback and iterate faster by pointing the agent to exactly the part of the interface you're referring to.Keyboard shortcuts include:
- ⌘ + Shift + D to toggle to Design Mode
- Shift + drag to select an area
- ⌘ + L to add element to chat
- ⌥ + click to add element to input
Agent Tabs in the Editor
Agent Tabs allow you to view multiple chats at once, side-by-side or in a grid.
Editor (4)
Plugins & MCP (2)
Enterprise & Teams (3)
Other Improvements (8)
Bug Fixes (8)
Original source Report a problem - Apr 2, 2026
- Date parsed from source:Apr 2, 2026
- First seen by Releasebot:Apr 3, 2026
Meet the new Cursor
Cursor releases Cursor 3, a unified agent-first workspace for building software. It brings a cleaner, faster interface with multi-repo support, one place for local and cloud agents, seamless handoff between environments, improved diffs, and integrated tools for browsing, plugins, and PR workflow.
Software development is changing, and so is Cursor.
In the last year, we moved from manually editing files to working with agents that write most of our code. How we create software will continue to evolve as we enter the third era of software development, where fleets of agents work autonomously to ship improvements.
We're building toward this future, but there is a lot of work left to make it happen. Engineers are still micromanaging individual agents, trying to keep track of different conversations, and jumping between multiple terminals, tools, and windows.
We're introducing Cursor 3, a unified workspace for building software with agents. The new Cursor interface brings clarity to the work agents produce, pulling you up to a higher level of abstraction, with the ability to dig deeper when you want. It's faster, cleaner, and more powerful, with a multi-repo layout, seamless handoff between local and cloud agents, and the option to switch back to the Cursor IDE at any time.
What's new in Cursor 3
When we started building Cursor, we forked VS Code instead of building an extension so we could shape our own surface. With Cursor 3, we took that a step further by building this new interface from scratch, centered around agents.
All your agents in one place
The new interface is inherently multi-workspace, allowing humans and agents to work across different repos.
Run many agents in parallel
Working with agents is now much easier. All local and cloud agents appear in the sidebar, including the ones you kick off from mobile, web, desktop, Slack, GitHub, and Linear.
Cloud agents produce demos and screenshots of their work for you to verify. This is the same experience you get at cursor.com/agents, now integrated into the desktop app.
New UX for handoff between local and cloud
We made moving agents between environments really fast.
Move an agent session from cloud to local when you want to make edits and test it on your own desktop.
Composer 2, our own frontier coding model with high usage limits, is great for iterating quickly.In the reverse direction, you can move an agent session from local to cloud to keep it running while you're offline, or so that you can move on to the next task. This is especially useful for longer-running tasks that would otherwise get interrupted when you close your laptop.
Go from commit to merged PR
The new diffs view allows you to edit and review changes faster with a simpler UI. When you're ready, you can stage, commit, and manage PRs.
Building on the best features of Cursor
Alpha users told us that a lot of what they like about Cursor 3 is the way it combines the best parts of the IDE with more recent capabilities we've shipped in an agent-first interface.
Files for understanding code
Dive deeper anytime by viewing files, and go to definition in the editor with full LSPs.
Integrated browser
Cursor can use the built-in browser to open, navigate, and prompt against local websites.
Plugins on the Cursor Marketplace
Browse hundreds of plugins that extend agents with MCPs, skills, subagents, and more. Install with one click, or set up your own team marketplace of private plugins.
The best way to code with AI
With Cursor 3, we have the foundational pieces in place—model, product, and runtime—to build more autonomous agents and better collaboration across teams. We will also continue to invest in the IDE until codebases are self-driving.
This won't be the last time the interface for building software changes. More powerful coding models will unlock new interaction patterns. We are excited to continue to build, simplify, and transform Cursor to be the best way to code with AI.
Upgrade Cursor, and type Cmd+Shift+P -> Agents Window to try the new interface. Or learn more in our docs.
Original source Report a problem All of your release notes in one feed
Join Releasebot and get updates from Cursor and hundreds of other software products.
- Mar 26, 2026
- Date parsed from source:Mar 26, 2026
- First seen by Releasebot:Mar 27, 2026
Improving Composer through real-time RL
Cursor introduces real-time RL for Composer, using real user interactions to train and deploy improved checkpoints as often as every five hours. It highlights better performance behind Auto, lower latency, and ongoing work to reduce reward hacking and adapt to longer, more specialized tasks.
We are observing unprecedented growth in the usefulness and adoption of coding models in the real world. In the face of 10–100x increases in inference volume, we consider the question: how can we take these trillions of tokens and extract from them a training signal to improve the model?
We call our approach of using real inference tokens for training "real-time RL." We first used this technique to train Tab and we found it was highly effective. Now we're applying a similar approach to Composer. We serve model checkpoints to production, observe user responses, and aggregate those responses as reward signals. This approach lets us ship an improved version of Composer behind Auto as often as every five hours.
The train-test mismatch
The primary way coding models like Composer are trained is by creating simulated coding environments, intended to be maximally faithful reproductions of the environments and problems that the model will encounter in real-world use. This has worked very well. One reason why coding is such an effective domain for RL is that, compared to other natural applications for RL such as robotics, it is much easier to create a high-fidelity simulation of the environment in which the model will operate when deployed.
Nonetheless, there is still some train-test mismatch incurred by the process of reconstructing a simulated environment. The greatest difficulty lies in modeling the user. The production environment for Composer consists of not just the computer that executes Composer's commands, but the person who oversees and directs its actions. It's much easier to simulate the computer than the person using it.
While there is promising research in creating models that simulate users, this approach unavoidably introduces modeling error. The attraction of using inference tokens for training signal is that it lets us use real environments and real users, eliminating this source of modeling uncertainty and train-test mismatch.
A new checkpoint every five hours
The infrastructure for real-time RL depends on many distinct layers of the Cursor stack. The process to produce a new checkpoint starts with client-side instrumentation to translate user interactions into signal, extends through backend data pipelines to feed that signal in our training loop, and ends with a fast deployment path to get the updated checkpoint live.
At a more granular level, each real-time RL cycle starts by collecting billions of tokens from user interactions with the current checkpoint and distilling them into reward signals. Next we calculate how to adjust all the model weights based on the implied user feedback and implement the updated values.
At this point there's still a chance our updated version is worse than the previous one in unexpected ways, so we run it against our eval suites, including CursorBench, to make sure there are no significant regressions. If the results are good, we deploy the checkpoint.
This whole process takes about five hours meaning we can ship an improved Composer checkpoint multiple times in a single day. This is important because it allows us to keep the data fully or almost-fully on-policy (such that the model being trained is the same model that generated the data). Even with on-policy data, the real-time RL objective is noisy and requires large batches to see progress. Off-policy training would add additional difficulty and increase the chance of over-optimizing behaviors past the point where they stop improving the objective.
We were able to improve Composer 1.5 via A/B testing behind Auto:
Metric Change Agent edit persists in codebase +2.28% User sends dissatisfied follow-up −3.13% Latency −10.3%Real-time RL and reward hacking
Models are adept at reward hacking. If there's an easy way to forestall a bad reward or cheat their way to a good one, they'll find it — learning, for example, to split code into artificially small functions to game a complexity metric.
This problem is especially acute in real-time RL, where the model is optimizing its behavior against the full production stack described above. Each seam in the stack — from the way data is collected to how it's converted into signal to the reward logic — becomes a surface the model can learn to exploit.
Reward hacking is a bigger risk in real-time RL, but it's also harder for the model to get away with. In simulated RL, a model that cheats simply posts a higher score. There's no reference beyond the benchmark to call it out. In real-time RL, real users trying to get things done are less forgiving. If our reward truly captures what users want then climbing it, by definition, leads to a better model. Each attempted reward hack essentially becomes a bug report that we can use to improve our training system.
Here are two examples that illustrate the challenge and how we adapted Composer's training in response.
When Composer responds to a user, it often needs to call tools like reading files or running terminal commands. Originally, we discarded examples where the tool call was invalid, and Composer figured out that if it deliberately emitted a broken tool call on a task it was likely to fail at, it would never receive a negative reward. We fixed this by correctly including broken tool calls as negative examples.
A subtler version of this shows up in editing behavior, where part of our reward is derived from the edits the model makes. At one point, Composer learned to defer risky edits by asking clarifying questions, recognizing that it wouldn't get punished for code it didn't write. In general, we want Composer to clarify prompts when they're ambiguous and avoid over-eager editing, but due to a particular quirk in our reward function, the incentive never reverses. Left unchecked, editing rates decrease precipitously. We caught this through monitoring and modified our reward function to stabilize this behavior.
Next up: learning from longer loops and specialization
Most interactions today are still relatively short, so Composer receives user feedback within an hour of suggesting an edit. As agents become more capable, though, we expect they will work on longer tasks in the background and might only return to the user for input every few hours or less.
This changes the kind of feedback we have to train on, making it less frequent but also crisper, because the user is evaluating a complete outcome rather than a single edit in isolation. We're working to adapt our real-time RL loop to these lower frequency, higher fidelity interactions.
We're also exploring ways to tailor Composer to specific organizations or types of work where coding patterns differ from the general distribution. Because real-time RL trains on real interactions from specific populations, rather than generic benchmarks, it naturally supports this kind of specialization in ways simulated RL does not.
Original source Report a problem - Mar 25, 2026
- Date parsed from source:Mar 25, 2026
- First seen by Releasebot:Mar 26, 2026
Self-hosted Cloud Agents
Cursor adds self-hosted cloud agents that keep code and tool execution in your own network.
Cursor now supports self-hosted cloud agents that keep your code and tool execution entirely in your own network.
Your codebase, build outputs, and secrets all stay on internal machines running in your infrastructure, while the agent handles tool calls locally.
Self-hosted cloud agents offer the same capabilities as Cursor-hosted cloud agents, including isolated VMs, full development environments, multi-model harnesses, plugins, and more.
Try it out today by enabling self-hosted cloud agents in your Cursor Dashboard. Read more in our announcement.
Original source Report a problem - Mar 25, 2026
- Date parsed from source:Mar 25, 2026
- First seen by Releasebot:Mar 26, 2026
Run cloud agents in your own infrastructure
Cursor now supports generally available self-hosted cloud agents, bringing enterprise-ready coding automation into your own network. Teams can keep code, tool execution, and build artifacts in-house while Cursor handles orchestration, parallel task execution, and the agent experience.
Cursor now supports self-hosted cloud agents that keep your code and tool execution entirely in your own network.
For agents to autonomously handle many software tasks in parallel, they need their own development environment. Cursor cloud agents run in isolated virtual machines, each with a terminal, browser, and full desktop. They clone your repo, set up the development environment, write and test code, push changes for review, and keep working whether or not you're online.
Today, we're making self-hosted cloud agents generally available. Self-hosted agents offer all the benefits of cloud agents with tighter security control: your codebase, tool execution, and build artifacts never leave your environment. For teams with complex development environments, self-hosted agents have access to your caches, dependencies, and network endpoints—just like an engineer or service account would.
Use Cursor's agent experience with workers that run inside your own infrastructure
"Cursor cloud agents are great at writing code within the context of our codebase. Now with self-hosted cloud agents, we can give them access to the infrastructure needed to run our test suites and validate changes with our internal tools. This self-hosted solution will allow us to delegate end-to-end software builds entirely to Cursor's cloud agents."
Graham Fuller
Senior Software Engineer, BrexWhy self-hosted
Many enterprises in highly-regulated spaces cannot let code, secrets, or build artifacts leave their environment due to security and compliance requirements. Some companies have stood up mature environments where critical inputs like caches, dependencies, and certain network endpoints can only be accessed through internal machines with strict configurations.
To meet these needs, some teams have diverted engineering resources towards building and maintaining their own background agents for coding. Customers like Brex, Money Forward, and Notion are using Cursor's self-hosted cloud agents instead.
With self-hosted cloud agents, teams can keep their existing security model, build environment, and internal network setup, while Cursor handles orchestration, model access, and the user experience. That allows engineering teams to spend less time maintaining agent infrastructure and more time using it.
"Given our strict security requirements as a financial services provider, self-hosted support is something we've been eagerly awaiting. Now building a workflow that enables nearly 1,000 engineers to create pull requests directly from Slack using Cursor's self-hosted cloud agents."
Yokoyama Tatsuo
Deputy Manager of SRE & MEPAR, Money ForwardSame product, your infrastructure
Self-hosted cloud agents offer the same capabilities as Cursor-hosted cloud agents:
- Isolated remote environments: each agent gets its own dedicated machine with no sharing, allowing for better parallelization.
- Multi-model: use Composer 2 or any frontier model inside our custom agent harness.
- Plugins: extend agents with skills, MCPs, subagents, rules, and hooks.
- Team permissions: control who can access and manage cloud agent runs across your org.
Self-hosted cloud agents will soon be able to demo their work by producing videos, screenshots, and logs for your review. You'll also be able to take over their remote desktop and use them to run automations.
"Self-hosted cloud agents are a meaningful step toward making coding agents enterprise ready. In large codebases like Notion's, running agent workloads in our own cloud environment allows agents to access more tools more securely and saves our team from needing to maintain multiple stacks."
Ben Kraft
Software Engineer, NotionHow it works
A worker is a process that connects outbound via HTTPS to Cursor's cloud—no inbound ports, firewall changes, or VPN tunnels required. When users kick off an agent session, Cursor's agent harness handles inference and planning, then sends tool calls to the worker for execution on your machine. Results flow back to Cursor for the next round of inference.
Each agent session gets its own dedicated worker, which is initiated with a single command:
agent worker startWorkers can be long-lived or single-use, handling sessions indefinitely or tearing down as soon as a task is complete.
For organizations scaling to thousands of workers, we provide a Helm chart and Kubernetes operator. You can define a WorkerDeployment resource with your desired pool size, and the controller handles scaling, rolling updates, and lifecycle management automatically. For non-Kubernetes environments, a fleet management API allows you to monitor utilization and build autoscaling on any infrastructure.
Try it out today by enabling self-hosted cloud agents in your Cursor Dashboard, and learn more in our docs. For larger company-wide deployments, reach out to our team.
Original source Report a problem - Mar 23, 2026
- Date parsed from source:Mar 23, 2026
- First seen by Releasebot:Mar 23, 2026
Fast regex search: indexing text for agent tools
Cursor improves Agent code search with fast, local text indexes built for large repos. The new approach speeds up regular expression matching, reduces ripgrep bottlenecks, and keeps searches fresh on the user’s machine for smoother enterprise workflows.
Time is a flat circle. When the first version of grep was released in 1973, it was a basic utility for matching regular expressions over text files in a filesystem. Over the years, as developer tools became more advanced, it was gradually superseded by more specialized tools. First, by roughly syntactic indexes such as ctags. Later on, many developers moved to specialized IDEs for specific programming languages that allowed them to navigate codebases very efficiently by parsing and building syntactical indexes, often augmented with type-level information. Eventually this was standardized in the Language Server Protocol (LSP), which brought these indexes to all text editors, new and old. Then, just when LSP was becoming a standard, Agentic coding arrived, and what do you know: the agents just love to use grep.
There are other state-of-the art techniques to gather context for Agents. We've talked in the past about how much you can improve Agent performance by using semantic indexes for many tasks, but there are specific queries which the model can only resolve by searching with regular expressions. This means going back to 1973, even though the field has advanced a little bit since then.
Most Agent harnesses, including ours, default to using ripgrep when providing a search tool. It's a standalone executable developed by Andrew Gallant which provides an alternative to the classic grep but with more sensible defaults (e.g. when it comes to ignoring files), and with much better performance. ripgrep is notoriously fast because Andrew has spent a lot of time thinking about speed when matching regular expressions.
No matter how fast ripgrep can match on the contents of a file, it has one serious limitation: it needs to match on the contents of all files. This is fine when working in a small project, but many of Cursor's users, particularly large Enterprise customers, work out of very large monorepos. Painstakingly large. We routinely see rg invocations that take more than 15 seconds, and that really stalls the workflow of anybody who's actively interacting with the Agent to guide it as it writes code.
Matching regular expressions is now a critical part of Agentic development, and we believe it's crucial to target it explicitly: much like a traditional IDE creates syntactic indexes locally for operations like Go To Definition, we're creating indexes for the core operation that modern Agents perform when looking up text.
The classic algorithm
The idea of indexing textual data for speeding up regular expression matches is far from new. It was first published in 1993 by Zobel, Moffat and Sacks-Davis in a paper called "Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files". They present an approach using n-grams (segments of a string with a width of n characters) for creating an inverted index, and heuristics for decomposing regular expressions into a tree of n-grams that can be looked up in the index.
If you've heard of this concept before, it's probably not from that paper, but from a blog post that Russ Cox published in 2012, shortly after the shutdown of Google Code Search. Let's do a quick refresher of the building blocks for these indexes, because they apply to basically every other approach to indexing that has been developed since.
Inverted Indexes
An inverted index is the fundamental data structure behind a search engine. Working off a set of documents to be indexed, you construct an inverted index by splitting each document into tokens. This is called tokenization, and there are many different ways to do it — for this example, we'll use the simplest possible approach, individual words as tokens. The tokens then become the keys on a dictionary-like data structure, while the values are, for each token, the list of all documents where it appears. This list is commonly known as a posting list, because each document is uniquely identified by a numeric value or "posting". When you search for one or more tokens, we load their posting lists; if there is more than one posting list, we intersect them to find the documents that appear in all of them.
This design (with a lot of complexity bolted-on on top of it) is the basis for most search engines available today. But these are search engines for natural language, and we're trying to search for regular expressions, and we're trying to match them over source code. This doesn't quite work.
You can try to build something useful here by thinking very hard about tokenization — being aware of the syntax of each programming language, breaking up the identifiers in source code, and so on. This is very hard to get right. Back in the early days of GitHub, their Code Search feature worked like that: with a very complex tokenizer for programming languages, and a very large ElasticSearch cluster. The results were not good, and people had very poor opinions of the feature. You could search for identifiers (kind of), but not match regular expressions. You need a better way to tokenize in order to do that.
Trigram Decomposition
Naive tokenization on source code is not useful for matching regular expressions. We need to split the documents into more fundamental chunks. The classic algorithm chooses trigrams: a token is every overlapping segment of three characters in the input string.
Why three? We're going to store these trigrams as the keys in our inverted index. If we were to choose bigrams (chunks of 2), we would have very few keys in our index, up to 64k, but the posting lists on each key would be massive — too large to work with efficiently. If we went with quadgrams (chunks of 4), the posting lists would be tiny, which is a very good thing, but we would have billions of keys in our inverted index, and that's also hard to work with.
Trigrams are hence a pretty good middle ground. This makes tokenization when indexing documents very simple: extract every overlapping sequence of 3 characters from the document being indexed and use that as your tokens in the inverted index.
The actual complexity comes when tokenizing a regular expression so that it can be matched against the index. Regular expressions have syntax, so you need to parse them and use heuristics to figure out what trigrams can be extracted from the segments of the expression that actually represent text.
Decomposing a literal string into trigrams is straightforward, as it is the same algorithm as when you index a document. Extract every overlapping trigram contained in the string; a document that contains all these trigrams will probably contain the literal (but not necessarily!). Alternations are decomposed separately, resulting in two branches where either must be contained in a document for it to match. We query this on the inverted index by joining the posting lists instead of intersecting them. Character classes can be decomposed into many trigrams. Small classes like [rbc]at result in one trigram for each element of the class. When using broader character classes, we simply skip extracting those trigrams across those boundaries.
Putting it all together
We know that trigrams are the right way to tokenize these documents, we know how to tokenize documents when building the index, and how to tokenize queries when searching. We can put all this together into an actual search index that can match regular expressions very efficiently. By decomposing any regular expression into a set of trigrams and loading all the relevant posting lists from the inverted index, we end up with a list of documents that can potentially match our regular expression. This is important! The final result set will only be obtained by actually loading all the potential documents and matching the regular expression "the old fashioned way". But having this sub-set of documents is always faster than having to scan and match the whole codebase, file by file.
This design is, by all means, fully functional. Projects like google/codesearch and sourcegraph/zoekt provide good performance for large indexes using an inverted index of trigrams (and like all search engines, they bolt-on a lot more complexity on top). But there are clear shortcomings here: the index sizes are not small, and decomposition at query time must make a trade-off. If you use simple heuristics, you'll decompose queries into a few trigrams, and that will result in a lot of potential documents to match. If you use complex heuristics, you may end up with dozens —perhaps hundreds— of trigrams, and loading all those from the inverted index may become as slow as simply searching everything from scratch.
We can do better than that.
Suffix Arrays: a detour
Since we're covering the history of indexing textual data for regular expression searches, I'd like to take a detour and discuss this implementation that Nelson Elhage developed in 2015 for his livegrep web service. Compared to other large industry efforts, livegrep is tiny —it only indexes the most recent version of the Linux Kernel— but because of its reduced scope, its implementation is very much unlike anything else out there, and that makes it very interesting and worth talking about.
Nelson attacked the problem from first principles: there's no inverted index powering this search engine. Instead, all the source code is indexed inside a suffix array.
The concept of a suffix array is self-descriptive: a sorted array of all the suffixes of a string. If you try constructing an array for a larger string, you'll see that the data structure grows quickly. It may seem a particularly expensive index, and in many ways it is, but its storage can be compressed very well if you have access to the original string: you can just store the offsets of the start of every suffix.
Once we have constructed a suffix array for the corpus to be searched, regular expression searches can be performed efficiently by de-composing the regular expression into literals. Every potential match position for a regular expression can then be found by performing a binary search over the suffix array.
More complex structures in the regular expression syntax can be matched by exploiting the same properties of the suffix array. For instance, if you're matching a character range such as [a-z], you can scope down the array by binary searching the start and the end of the range. Content between those two endpoints will necessarily match the range.
What are the shortcomings here? A suffix array must be constructed out of an input string. That is a big limitation. If you're trying to index a large codebase (or perhaps many different codebases), you'll first need to concatenate all the content into a single string, and construct the suffix array out of that. When matching inside the suffix array, you'll also need an auxiliary data structure to map the match position to the original file that contains it. It is not insurmountable complexity, but it makes dynamically updating the index very expensive. This is a solution that is very hard to scale.
Trigram Queries with Probabilistic Masks
Jumping back to some more traditional designs: here's an approach that was originally developed at GitHub for Project Blackbird. This was a research project aiming to replace the old Code Search feature. As we've discussed earlier, the old search was implemented by tokenizing source code and couldn't match regular expressions. The goal for this new implementation was developing something that could.
The first iterations attempted to use the classic inverted index with trigrams as keys, but it quickly ran into capacity issues. There is a lot of code in GitHub, and using trigrams to index it resulted in posting lists that were just too large to search.
As trigrams were not quite working out, the next step was finding a better size for the n-grams that would be indexed. We've seen that bigrams are too broad, because their posting lists become unmanageably large, and that quadgrams are too specific, because we end up with too many keys in our index. Trigrams are a sweet spot between the two, but in practice, the ideal size is more like... 3.5-grams. Yet we can't split a character in two, can we?
We can, in fact, do something quite close to that: this design proposes using trigrams as the key for the inverted index, and augmenting the posting lists with extra information about the "fourth character" that would follow the trigram in that specific document. To do that, we could simply store that fourth character as an extra byte, but that turns our index into a quadgram index, and we've seen those are just too large to store. What we store instead is a bloom filter that contains all the characters that follow that specific trigram.
You may think of a bloom filter as a very large and complex data structure, but it needn't be so. You can squeeze a bloom filter into very few bits. A lot of information can fit in 8 bits if you're careful when encoding it. With just two bytes per posting, we can work around the two biggest issues in a classic trigram index.
By having a mask that contains the characters following each trigram, our inverted index can be constructed using trigram keys, but we can query it using quadgrams! This already scopes down the potential documents much more than a simple trigram index could.
A second augmented mask, containing the offsets where the trigram appears in the document, solves the trigram ambiguity issue: just because a document contains two trigrams doesn't mean that they're actually next to each other, which is what we need to match our query. By shifting the position mask of our second trigram one bit to the left and comparing it with the mask for the first trigram, we can ensure that they are indeed adjacent. With particularly common trigrams, this is invaluable for scoping down even further the list of candidate documents.
All this information is, of course, probabilistic: like anything stored in a bloom filter, it can yield false positives. But false positives are always acceptable here, because the final matching is performed deterministically on the text itself. The goal is using our index to minimize the amount of potential documents we need to scan.
The resulting indexes are extremely efficient, but they have a major shortcoming. Bloom filters can become saturated. That is an unfortunate property of bloom filters; they can be updated, but if you add too much data to them, eventually all the bits in the filter are set. And once the bloom filter is saturated, it matches everything, so we're back to the performance of the very first index we talked about.
This is an index that minimizes storage, but it becomes painful when you need to update it in-place.
Sparse N-grams: Smarter Trigram Selection
Here's another very smart idea. You may have seen it used in ClickHouse for their regular expression operator, and also at GitHub, in the new Code Search feature that shipped a couple years ago and which does allow matching regular expressions. It's called Sparse N-grams, and it is the sweetest of the middle grounds.
A traditional trigram index extracts every consecutive 3-character sequence, but you can see how this creates a lot of redundancy. The characters in every trigram are duplicated in the adjacent ones! In this algorithm, we extract a random amount of n-grams, with each n-gram having a random length.
Of course random here cannot be truly random, because then the index couldn't be queried. We are assigning a "weight" to every pair of characters in the document. This weight could be anything, as long as it's deterministic (ClickHouse uses the crc32 hash of the two characters). Then, our sparse n-grams are all substrings where the weights at both ends are strictly greater than all the weights contained inside.
Crucially, this means that sparse n-grams can have any length. They are not consistent. It also means that we can end up generating a lot of them — more than if we were simply extracting trigrams. But because the n-grams are being generated deterministically, we can do some very important optimizations at query time. Let's see how.
This is not an easy algorithm to understand, so we'll have to play with it. You can use the back and forward arrows in the visualization to step through it.
Above the character breakdown for the input, you can see the random weight given to each character pair. These weights are what determine the segments that will be extracted as n-grams.
In the bottommost section, you can see a breakdown of how many sparse n-grams are extracted for the input string, and how many would be extracted if we were doing bigrams, trigrams or quadgrams. Note the stark difference: we're actually extracting a lot of sparse n-grams!
So what's the deal here? Are we simply doing something silly? Not quite. We're paying a high upfront cost when indexing so that we can have very fast queries at query time. The build_all algorithm you're watching right now is what we use when indexing documents. It extracts all the possible sparse n-grams from the input. Note, however, that we don't have to do that when querying. Because the weights are random but deterministic, at query time we can use a covering algorithm that only generates the minimal amount of n-grams required to match in the index.
We know that the n-grams are minimal because at index time, we only generate them when all the weights contained inside are smaller than the ones at the edges. Hence, we only need to extract the sparse n-grams at the edges —way fewer than if we were to extract all trigrams— and we'll be able to select our potential documents with very high specificity.
Can we do better than this? Yes! Much better, in fact. We've been using crc32 as our weight function in the algorithm as an example. However, any hash function would work here, as long as it's deterministic. Let's pick something very smart: a hash function that gives a high weight to every pair of characters that is actually very rare, and a low weight to every pair that is very frequent.
This hash function is easy to compute. Since we're going to be indexing source code, we can pick up a couple terabytes of Open-Source code from the internet and build a frequency table for all the character pairs we find in it. That frequency table is our hash function. See what happens when we apply it to our algorithm: the highest weights now appear under the least frequent pairs of characters, and because of this, the covering mode results in even fewer n-grams to lookup, and fewer documents that can possibly match.
This approach that minimizes the amount of posting lookups will serve as the perfect starting point to construct indexes that can be efficiently queried on the users' machines.
All this, in your machine
Indexes for speeding up regular expression search need to live somewhere. All the designs we've seen so far have been deployed on the server side, and the semantic indexes we've talked about are also managed and queried on the server. And yet, we're choosing to go in a different direction here: we're building and querying the indexes in the users' machines.
There are several reasons why keeping these indexes locally makes sense. First, the indexes are just one part of what it is required to match a regular expression. They provide a scoped down subset of documents where the regular expressions could match, but you still need to individually scan each file. Doing that on the server would mean either synchronizing all the files, or performing expensive roundtrips back and forth to the client. Doing this on the client is trivial, and also sidesteps a lot of security and privacy concerns around data storage.
Latency also matters a lot for this functionality. Our Composer model has one of the fastest tokens per second (TPS) in the industry, and we're working hard to make it both smarter and faster. Adding network roundtrips for such a critical operation that the model uses constantly (oftentimes in parallel) just adds friction, stalls, and takes us in the opposite direction of what our goal is for interacting with Agents.
Unlike with semantic indexes, an index for regular expression search also needs to be very fresh, particularly when it comes to the model reading its own writes. We don't have to continuously update our semantic index because re-computing the embeddings for a file after it is modified does not cause the new embedding to significantly displace itself in the multi-dimensional space. The nearest-neighbor search we perform will still send the Agent in the right direction. However, if the agent is searching for specific text and it does not find it, it'll often go into a wild goose chase, waste tokens, and defeat the purpose of our performance optimization in the first place.
Bringing these indexes to the client does come with its own set of challenges. Synchronizing disk data can be complex and expensive, but we make it very efficient in practice: we control the state of the index by basing it off a commit in the underlying Git repository. User and agent changes are stored as a layer on top of it. This makes it very quick to update, and very fast to load and synchronize on startup.
To ensure that memory usage in the editor remains minimal, we store our indexes in two separate files. The first file contains all the posting lists for the index, one after the other — we flush this directly to disk during construction. The other file contains a sorted table with the hashes for all n-grams and the offset for their corresponding posting list in the postings file. Storing hashes here without storing the full n-grams is always safe: it can cause a posting list to become more broad when two hashes collide (extremely unlikely in practice), but it cannot give incorrect results. It also gives us a very tight layout for the lookup table. We then mmap this table, and only this table, in the editor process, and use it to serve queries with a binary search. The search returns an offset, and we read directly at that offset on the postings file.
Conclusions
We've found that providing text search indexes to fast models, such as our own Composer 2, creates a qualitative difference for Agentic workflows. The impact is much more pronounced in larger Enterprise repositories, because grep is one of the few Agent operations whose latency scales with the size and complexity of the code being worked on. Take a look at these example workflows running with Composer 2: removing altogether the time spent searching the codebase provides meaningful time savings —particularly when the Agent investigates bugs— and allows for much more effective iteration.
As for what's next, who knows! There are many exciting developments around providing context for Agents, and a lot of researchers working in the space — including ours. We're going to continue optimizing the performance of current approaches, including semantic indexes, and we're hoping to bring forward brand new ways of improving the performance of Agents even further, whilst always ensuring that they're operable where they really matter: in the largest repositories of the world, where the future of Agentic development is really gaining traction.
Original source Report a problem - Mar 19, 2026
- Date parsed from source:Mar 19, 2026
- First seen by Releasebot:Mar 20, 2026
Composer 2
Cursor adds Composer 2, bringing frontier-level coding performance for challenging tasks with Standard and Fast pricing.
Composer 2 is now available in Cursor: frontier-level coding performance with strong results on challenging coding tasks.
- Standard: $0.50/M input, $2.50/M output tokens
- Fast (default): $1.50/M input, $7.50/M output tokens
Read more in our announcement.
Original source Report a problem - Mar 19, 2026
- Date parsed from source:Mar 19, 2026
- First seen by Releasebot:Mar 19, 2026
Introducing Composer 2
Cursor releases Composer 2, a frontier-level coding assistant now available with pricing $0.50/input and $2.50/output tokens, plus a faster variant at $1.50/$7.50. It cites strong benchmark gains, continued pretraining, and long-horizon task solving, with usage in standalone pools.
Composer 2 in Cursor
Composer 2 is now available in Cursor.
It's frontier-level at coding and priced at $0.50/M input and $2.50/M output tokens, making it a new, optimal combination of intelligence and cost.
Frontier-level coding intelligence
We're rapidly improving the quality of our model. Composer 2 delivers large improvements on all benchmarks we measure, including Terminal-Bench 2.0¹ and SWE-bench Multilingual:
These quality improvements come from our first continued pretraining run, which provides a far stronger base to scale our reinforcement learning.
From this base, we train on long-horizon coding tasks through reinforcement learning. Composer 2 is able to solve challenging tasks requiring hundreds of actions.
Try Composer 2
Composer 2 is priced at $0.50/M input and $2.50/M output tokens.
There is also a
strong faster variant with the same intelligence at $1.50/M input and $7.50/M output tokens, which has a lower cost than other fast models². We're making fast the default option. See our model docs for full details.On individual plans, Composer usage is part of a standalone usage pool with generous usage included. Try Composer 2 today in Cursor.
Terminal-Bench 2.0 is an agent evaluation benchmark for terminal use maintained by the Laude Institute. Anthropic model scores use the Claude Code harness and OpenAI model scores use the Simple Codex harness. Our Cursor score was computed using the official Harbor evaluation framework (the designated harness for Terminal-Bench 2.0) with default benchmark settings. We ran 5 iterations per model-agent pair and report the average. More details on the benchmark can be found at the official Terminal Bench website. For other models besides Composer 2, we took the max score between the official leaderboard score and the score recorded running in our infrastructure. ↩
Tokens per second (TPS) for all models are from a snapshot of Cursor traffic on March 18th, 2026. Token sizing for Composer and GPT models are similar. Anthropic tokens are ~15% smaller and the TPS number is normalized to reflect that. Similarly, output token price for non-Anthropic models was scaled to match the same ~15% change. Speed may vary depending on provider capacity and improvements over time. ↩
- Mar 18, 2026
- Date parsed from source:Mar 18, 2026
- First seen by Releasebot:Mar 18, 2026
Money Forward brings Cursor’s coding agents to product, design, and QA
Cursor releases a company-wide rollout at Money Forward, expanding AI coding agents from engineering to product, design, and QA. Engineers save 15–20 hours weekly, QA cuts test-generation time 70%, designers prototype against live frontends, and PMs extract better requirements. Broad adoption signals real product impact.
Money Forward and Cursor
Money Forward set out to bring coding agents to every team that touches how software is built. It started with engineering, where Cursor quickly started saving developers 15–20 hours a week, then expanded to product, design, and quality assurance (QA).
Today, over 1,000 employees at Money Forward use Cursor daily. QA engineers are generating test cases 70% faster. Product managers are analyzing production code to write better specifications. Designers are prototyping directly against live frontends and analyzing user data to refine designs.
Proving value in engineering first
Money Forward’s engineering team was initially using other external vendors for code autocompletion and basic AI chat functionality. Adoption had largely stalled as developers were not seeing meaningful time savings on software tasks.
After introducing Cursor, the number of engineers using coding agents increased by 30% within just a week.
We held an engineering all-hands where we showed that Cursor’s agents could actually tackle entire software engineering tasks from end-to-end. The bottoms-up demand from developers was immediate.
Aaron Li
Staff Engineer, Money ForwardDevelopers are now individually saving an estimated 15–20 hours a week with Cursor across tasks like:
- Refactoring service layers for Money Forward’s iOS applications
- Optimizing Rails applications to drive 10x performance improvements
- Managing AWS and GCP deployments with Terraform
- Migrating legacy front-end services from Vue to React
But as engineering began shipping software faster, product, design, and QA became the constraint.
Evaluating Cursor for a company-wide rollout
Money Forward’s Engineering Productivity and AI Research (MEPAR) department evaluated several different AI coding tools before selecting Cursor for its company-wide agent rollout.
Cursor’s model-agnostic infrastructure lets us parallelize long-running tasks across asynchronous cloud agents. The agents connect to our internal tools for fast context retrieval without the limitations of local hardware. Cursor's role is expanding quickly across all our teams.
Tran Ba Vinh Son
Group Company CTO and Manager of MEPAR, Money ForwardA few advantages made the difference:
- Minimal setup: Users can start building with agents immediately, without any complex environment configuration. This made adoption practical across functions with varying technical depth.
- Visual capabilities: Cursor’s built-in browser made it easy for designers and QA engineers to visually verify agent changes. These teams preferred Cursor’s rich interface over terminal-based alternatives, where reviewing visual output required extra tooling.
- Unified agent workspace: Cursor offered a single platform for code generation, review, testing, and debugging so users didn’t have to switch between tools to do their work.
- Large codebase performance: Money Forward maintains complex, interconnected production systems. Cursor’s context retrieval performed reliably against these codebases, which was critical for non-engineering teams interacting with production code for the first time.
Cursor has spread beyond engineering to design, product, and QA. These groups had low acceptance rates for other tools that haven't invested in a robust UI and user experience.
Aaron Li
Staff Engineer, Money ForwardQA automates test generation and moves upstream
Before Cursor, QA engineers were manually reading product specs, developing test cases for each user story, and writing test scripts.
Now, QA engineers feed Cursor relevant Jira tickets and Notion docs using MCPs. One agent then generates structured test cases while a second agent translates them into Playwright scripts.
As a result, time spent on test generation has decreased by 70%. QA teams are now spending more time influencing product quality earlier in the software lifecycle by focusing on risk-based testing and quality gates.
The QA team now uses Cursor to analyze incidents, automate test results, and review specs before development. Cursor is changing how we ensure software excellence.
Xie Lester
Director in the Chief Quality Office, Money ForwardProduct uses Cursor to refine requirements
Cursor helps PMs extract system relationships from repositories, generate architecture diagrams, and draft PRDs that are grounded in real implementation details.
This approach has helped product teams identify edge cases and overlooked constraints before engineering work begins, improving the overall efficiency of the software development lifecycle.
Even when specifications don’t exist in our docs, Cursor can identify them directly from the code. This allows us to develop better product requirements for engineering to build from.
Shoichiro Onishi
Product Manager, Money ForwardDesign works directly against production code
Historically, designers worked from static mockups and secondhand descriptions of system behavior. Designers were often removed from the actual user journey and business data that determined whether a feature succeeded or failed.
Designers at Money Forward now use Cursor’s browser capabilities and fullstack context to iterate against application frontends directly in code. Designers also use Cursor’s agent and MCPs to directly access product analytics and refine designs accordingly.
With Cursor, I can access product specs and data myself. That allows me to design with a clearer understanding of how the product actually behaves, not just how it's described.
Ryota Sudo
Product Designer, Money ForwardIf you are interested in bringing agents to every team that touches your SDLC, reach out to start a Cursor trial.
Original source Report a problem - Mar 16, 2026
- Date parsed from source:Mar 16, 2026
- First seen by Releasebot:Mar 17, 2026
Securing our codebase with autonomous agents
Cursor releases four security automation templates for Cursor Automations, enabling scalable vulnerability detection and repair across thousands of PRs. It details architecture, an MCP, and four agents: Agentic Security Review, Vuln Hunter, Anybump, and Invariant Sentinel, plus future automation plans.
The automations architecture
Over the last nine months, our PR velocity has increased 5x. Security tooling based on static analysis or rigid code ownership remains helpful, but is not enough at this scale. We've adapted by using Cursor Automations, which has allowed us to quickly build a fleet of security agents that continuously identify and repair vulnerabilities in our codebase.
Security agents are reviewing 3,000+ internal PRs each week, catching 200+ vulnerabilities
Today, we're releasing four new automation templates with the exact blueprints of the security agents we've found to be most helpful. Other security teams can customize these templates to build agents that automatically resolve a wide range of security issues.
Link to this section The automations architecture
The automations architecture
For agents to be useful for security, they need two features, both of which Cursor Automations provides.
The first is out-of-the-box integrations for receiving webhooks, responding to GitHub pull requests, and monitoring codebase changes. This allows agents that operate in the background to know when to step forward and take action.
The second is a rich agent harness and environment. Automations are powered by cloud agents, which gives them all the tools, skills, and observability that cloud agents have access to.
To make automations more powerful for security-specific use cases, we built a security MCP tool and deployed it as a serverless Lambda function, available just-in-time when needed, and not otherwise running.
The MCP, whose reference code is available here, serves three purposes:
Persistent data.
The agent uses the MCP to store data, so we can track and measure security impact over time. We use that data to continually refine when and how we trigger automations.Deduplication.
We run multiple review agents on every change, and because their findings are generated by an LLM, different agents can end up using different words to describe the same underlying issue. To avoid duplicate work, the MCP allows the agent to deploy a classifier powered by Gemini Flash 2.5 that determines when two semantically distinct findings describe the same problem.Consistent output.
Agents report every vulnerability they find through the MCP, which sends consistently formatted Slack messages and handles further actions like dismissing or snoozing a finding.
With this foundation in place, the four security automations detailed below layer on their own workflows and trigger logic. We use Terraform to ensure that all changes to security tooling go through a standard review and deployment process.
Link to this section Agentic Security Review
Agentic Security Review
Internally, we were already using Bugbot to review PRs for code quality and general issues, including some security findings. But a general-purpose review tool isn't ideal for security because it can't be prompt-tuned to our specific threat model, and because we needed the ability to block CI on security findings specifically, without blocking on every general code quality issue.
Given that, we built a dedicated automation we call Agentic Security Review. Initially we had it forward its findings to a private Slack channel monitored by our security team.
Agentic Security Review sends findings to a private Slack channel monitored by the security team.
Once we were confident it was identifying genuine issues, we turned on PR commenting, then implemented a blocking gate check. In the last two months, Agentic Security Review has run on thousands of PRs and prevented hundreds of issues from reaching production.
Link to this section Vuln Hunter
Vuln Hunter
After the success of Agentic Security Review on new code, we pointed agents at the existing codebase. Vuln Hunter is an automation that divides the code into logical segments and searches each one for vulnerabilities. Our team triages findings and typically fixes them, often using @Cursor from Slack to generate PRs.
Link to this section Anybump
Anybump
Dependency patching is so time intensive that most security teams eventually give up and push it to engineering, where it sits in backlogs. We created an automation called Anybump that has entirely automated nearly all of it.
Anybump runs reachability analysis to narrow vulnerabilities to those that are actually impactful, then traces through the relevant code paths, runs tests, checks for breakage, and opens a PR once tests pass. After the PR is merged, Cursor's canary deployment pipeline provides a final safety gate before anything reaches production.
Anybump automatically opens PRs to patch vulnerable dependencies after tests pass.
Link to this section Invariant Sentinel
Invariant Sentinel
Invariant Sentinel runs daily to monitor for drift against a set of security and compliance properties. It divides the repo into logical segments and spins up subagents to validate code against a list of invariants.
After analysis, the agent compares current state against previous runs using the automations memory feature. If it detects drift, it revalidates to ensure correctness, then updates its memory and sends a Slack report to the security team with a description of the change and specific code locations as evidence.
Because this automation runs in a full development environment, the agent can write and execute code to validate its own assumptions, complementing traditional functional, unit, and integration tests.
Link to this section More automations to come
More automations to come
Security is full of opportunities to apply automations, and these four are just the beginning of the work we plan to do. We're already extending them to encompass vulnerability report intake, privacy compliance monitoring, on-call alert triage, and access provisioning.
In each case, agents give us coverage and consistency at a scale we couldn't achieve manually.
Original source Report a problem