Cohere Release Notes
77 release notes curated from 21 sources by the Releasebot Team. Last updated: Jun 10, 2026
- Jun 9, 2026
- Date parsed from source:Jun 9, 2026
- First seen by Releasebot:Jun 10, 2026
Introducing North Mini Code: Cohere’s first model for developers
Cohere launches North Mini Code, its first open-source agentic coding model for developers. The 30B MoE model is built for efficient code generation, software engineering, and terminal tasks, with strong performance, fast throughput, and availability on Hugging Face, Model Vault, Cohere API, and OpenRouter.
Small, efficient, and open-source — our first agentic coding model, built for the sovereign developer ecosystem.
Today we're launching North Mini Code open-source. A mixture-of-experts (MoE) model, North Mini Code is Cohere's first agentic coding model, and the inaugural member of our next generation of powerful models.
At 30B total parameters with just 3B active, North Mini Code delivers strong software development performance without demanding extensive hardware to match. Efficient by design, it's built to run where you need it.
Freely available under an Apache 2.0 license, North Mini Code advances Cohere’s mission to make sovereign AI a practical reality, giving developers direct access to agentic coding capabilities. We're building in the open, because the future of AI should be shaped by the people running, testing, and improving it.
Download the weights on Hugging Face, or deploy in a dedicated, managed inference environment on Model Vault. Alternatively, try it for free in your harness of choice on OpenCode or with a Cohere API key. Share what you build and tag @ Cohere on X or Discord, or engage with us on Reddit.
Snapshot
Model: North-Mini-Code-1.0
License: Apache 2.0
Model size: 30B total; 3B active
Context length: 256K total context; 64K max generation
Optimized for: Code generation, agentic software engineering, and terminal tasks
Availability: Hugging Face (Weights), Cohere API, Cohere Model Vault, OpenRouter
Hardware (minimum): 1× H100 @ FP8
Agentic coding capabilities
North Mini Code achieves competitive scores across benchmarks against models of this size class, demonstrating strong performance in real-world software engineering tasks.
Image 1: North Mini Code’s performance in agentic software engineering and terminal tasks, along with complex code generation benchmarks, compared to leading open-source models of a similar size. ¹ ²
North Mini Code’s benchmark scores translate to a 33.4 on the Artificial Analysis Coding Index, a competitive position among similarly sized models.
The speed advantage for developer tasks
North Mini Code is designed for speed and efficiency, with a strong focus on minimizing total cost of ownership as we continue to refine and scale the model.
In our testing, North Mini Code achieved up to 2.8x higher output throughput than Devstral Small 2 under identical concurrency levels and hardware configurations. In practical terms, that translates to nearly three times the work rate, enabling faster iteration while reducing computational overhead.
North Mini Code also demonstrated a 30% advantage in inter-token latency, a metric that reflects the consistency and pacing of token generation. Time-to-first-token (TTFT) performance was more closely matched between the two models, with Devstral Small 2 maintaining a slight edge across the tested conditions.
Image 2: North Mini Code’s output speed and latency compared to Devstral Small 2, across high and low concurrencies, in internal tests using coding prompts.
Sovereign open models for developers
North Mini Code is our first open-source model for developers. As coding agents transform software engineering, developers need control and flexibility over their agentic coding infrastructure.
North Mini Code represents a step forward in small agentic coding models that can accomplish tasks that matter to developers. Specifically, it is built for agentic workflows, including understanding and orchestrating sub-agents, mapping systems architecture, and running code reviews. Deploy on-prem or locally, on your own terms.
Community feedback will directly shape our roadmap as we expand the ecosystem toward more open and sovereign developer models. Try North Mini Code when you need freedom from vendor constraints, and help us build what's next.
What’s next?
North Mini Code launches as the first—but certainly not the last—of Cohere's new generation of powerful models, designed for a more sovereign open-source ecosystem.
We're committed to increasing our capabilities, with community input informing what comes next.
Getting started
Help us build a complete sovereign AI ecosystem for software development by trying North Mini Code. North Mini Code is available for free on Hugging Face and Model Vault—our fully managed inference platform. We've specifically trained it for compatibility with OpenCode, but it works with most coding agents.
Share what you build and tag @ Cohere on X or Discord, or engage with us on Reddit to help shape the future of sovereign models.
Visit our documentation for detailed model specs, deployment guides, and cookbooks to get started.
Footnotes
1 We used publicly reported scores for competitor models either from original reports or Artificial Analysis Intelligence Index where available. Additionally, Gemma 4’s scores for agentic coding tasks were reported by Qwen team. For the benchmark results that any public report is missing denoted by (*) in Image 1, we run internally with recommended model configuration.
2 We evaluated North Mini Code using “SWE-agent” harness for SWE-Bench Verified and SWE-Bench Pro, and a simple ReAct harness employing a single terminal-use tool for Terminal Bench v2. For Terminal Bench Hard, we used Terminus-2 harness for both North Mini Code and the other models that are evaluated internally.
Original source - Jun 9, 2026
- Date parsed from source:Jun 9, 2026
- First seen by Releasebot:Jun 9, 2026
Announcing Cohere's North-Mini-Code-1.0
Cohere releases North-Mini-Code-1.0, its first agentic coding model, built as a 30B total, 3B active Mixture of Experts model for local-friendly coding workloads. It is available through the Chat V2 API, as open weights on Hugging Face, and with Model Vault deployment support.
We're pleased to announce the release of North-Mini-Code-1.0, Cohere's first agentic coding model. It is a 30 billion total / 3 billion active parameter Mixture of Experts model trained specifically for agentic coding, with a small enough active footprint to run on local hardware.
Technical Details
Model Name: north-mini-code-1-0
Context Length: 256K input, 64K output
License: Apache 2.0
Availability
North-Mini-Code-1.0 is available through the Chat V2 API and as open weights on Hugging Face. For production use, Model Vault deployment is also supported.
For more details, see the model documentation.
Original source All of your release notes in one feed
Join Releasebot and get updates from Cohere and hundreds of other software products.
- May 20, 2026
- Date parsed from source:May 20, 2026
- First seen by Releasebot:Jun 1, 2026
Introducing Command A+: Making sovereign agentic capabilities available to all
Cohere releases Command A+, an open-source enterprise LLM for complex reasoning, multimodal and multilingual agentic tasks. The model runs efficiently on as little as two H100 GPUs, is freely available under Apache 2.0, and brings faster inference with broader language coverage.
Our fastest and most powerful language model yet.
Command A+ is an open-source enterprise workhorse built for complex reasoning, multimodal and multilingual agentic tasks — all while running on as little as two H100 GPUs.
Today, we’re releasing Command A+ open-source. A mixture-of-experts (MoE) model, Command A+ is an efficient, versatile, and privately deployable LLM built for high-performance agentic tasks with minimal compute overhead.
Born from a year of deploying North with our customers, it surpasses every previous generation in the Command series and unifies their capabilities into a single scalable model.
Now freely available under an Apache 2.0 license, Command A+ advances Cohere’s mission to make sovereign AI a technological reality — giving developers direct access to enterprise-grade agentic capabilities across experimentation, deployment, and production workflows.
Visit Hugging Face to download the weights - available in several near lossless quantizations - and read our implementation guides. For a dedicated, managed inference environment, deploy Command A+ in Model Vault today.
Snapshot:
- Model: command-a-plus-05-2026
- License: Apache 2.0
- Architecture: Sparse / MoE
- Model size: 218B total; 25B active
- Context length: 128K input context; 64K max generation
- Input modalities: Text, image, tool use
- Output modalities: Text, reasoning, tool use
- Languages: Supports 48 languages.
- Optimized for: Reasoning, agentic workflows, RAG, multilingual, multimodal document processing
- Supported frameworks: vLLM, Transformers
- Hardware (minimum): 1× B200 @ W4A4, 2× H100s @ W4A4
Northwards:
For the past year, North — Cohere’s integrated enterprise workspace for building and deploying agentic AI — has been the driving force behind much of our innovation. Through that work, we set out to build a unified model for customers that simplifies deployment, can run locally, and synthesizes capabilities from across the Command family.
The work is already paying off. Read how our customers have been using North to transform their operations.
However, sovereign AI is much bigger than Cohere. Empowering engineers with models that they can run, control, and adapt themselves is the most acute challenge facing this generation of AI.
We’ve optimized Command A+ for practical, developer-focused use, including support for low-bit quantization, efficient inference, and integration across open inference frameworks. AI independence for all.
We can’t wait to see what the community builds.
Command, consolidated:
Command A+ outperforms previous Command A models in key dimensions of enterprise workloads, including multimodal understanding, retrieval, long-horizon, and complex reasoning.
Compared with Command A Reasoning, 𝜏²-Bench Telecom scores improved from 37% to 85%, with agentic coding performance on Terminal-Bench Hard reaching 25% from 3%. Gains were also achieved on non-agentic reasoning, instruction following, and other code generation tasks.
Command A+ performs strongly within North applications, reflecting its original design goals. Agentic Question Answering accuracy and spreadsheet analysis quality improved by 20% and 32% over Command A Reasoning, respectively. Memory performance — testing North’s skill in reasoning across conversations and stored data — scored 54% with Command A+ compared to 39% with Command A Reasoning.
For multimodal understanding and reasoning, Command A+ achieved 63% on MMMU Pro and 75.1% on MMMU, (compared with 65.3% for Command A Vision for the latter). MathVista scores increased from 73.5% to 80.6%, and CharXiv reasoning improved from 46.9% to 52.7%, reflecting broad gains across document understanding tasks.
Command A+ significantly expands multilingual capability, broadening language coverage from 23 to 48 languages and recording gains in machine translation and multilingual reasoning.
Command A+ achieved a score of 37 on the Artificial Analysis Intelligence Index, outperforming other leading open models, reflecting its strength as a general-purpose model for enterprise agentic workflows.
Efficiency at scale:
Efficiency is a core constraint in enterprise AI deployment. It determines whether a language model can be deployed practically at scale by shaping the compute, memory, latency, power, and infrastructure required to serve it reliably and cost-effectively.
We engineered Command A+ to be extremely hardware efficient. The model is available today on Hugging Face in 16-bit (BF16), 8-bit (FP8), and 4-bit (W4A4) quantizations, with imperceptible differences in quality. In practice, this enables Command A+ to run on as little as two NVIDIA H100s or a single NVIDIA Blackwell GPU, with virtually no quality degradation.
Command A+ is also our fastest model to date, having 218B total and 25B active parameters compared to Command A Reasoning’s 111B dense architecture. At the same quantization and concurrency levels, it delivers up to 63% higher Output Tokens per Second (TOPS), and reduces Time To First token (TTFT) by up to 17%. The W4A4 quantization contributes an additional 47% increase in speed and a further 13% reduction in latency.
We’re also using speculative decoding to accelerate text generation without impacting output quality. We optimized the approach specifically for the model’s MoE architecture, delivering an additional 1.5-1.6x inference speedup for both text and multimodal inputs.
Command A+ is the first model to use our latest tokenizer, delivering substantial compression improvements over its predecessor. Fewer tokens are now required to generate the same response, reducing a major driver of inference cost. Notably, these gains extend to major non-European languages, which are often underrepresented during tokenizer training. Tokenization efficiency improved by 20% for Arabic, 16% for Korean, and 18% for Japanese.
Fujitsu believes Command A+’s mixture-of-experts architecture and strong agentic performance align well with our commitment to deliver innovative, sovereign AI solutions through Takane and the Kozuchi Enterprise AI Factory. We look forward to leveraging its capabilities to accelerate secure, scalable AI adoption for our customers. — Vivek Mahajan, Corporate Executive Officer, Corporate Vice President, CTO, in charge of System Platform, Fujitsu Limited
What’s next?
Progress in sovereign AI today depends on advancing three fronts simultaneously: performance, security, and cost. At Cohere, we are investing across all three — both in our models and in the domain-specific capabilities that power North.
That means improving reasoning, multimodal understanding, and coding performance, while ensuring models remain fit to run entirely within customer environments. The goal is not just stronger benchmarks, but systems that can support enterprise-wide transformation under real operational constraints.
We have already applied this approach across our other model families — including Embed, Rerank, and Transcribe — where we have achieved state-of-the-art performance alongside efficient, cost-aware inference.
Getting started:
Command A+ is available today on Hugging Face, as well as through Model Vault. You can also try the model for free on our Space or with a Cohere API key.
Visit our documentation for detailed model specs, deployment guides, and cookbooks to get started.
Original source - Mar 26, 2026
- Date parsed from source:Mar 26, 2026
- First seen by Releasebot:Jun 1, 2026
Introducing Cohere Transcribe: a new state-of-the-art in open-source speech recognition
Cohere launches Transcribe, an open-source ASR model for accurate speech-to-text across 14 languages. It offers strong real-world transcription performance, best-in-class throughput, and flexible deployment through Hugging Face, API access, and Model Vault.
Cohere is announcing Transcribe, a state-of-the-art automatic speech recognition (ASR) model that is open source and available today for download.
Speech is rapidly becoming a core modality for AI-enabled workloads and automations — from meeting transcription and speech analytics to real-time customer support agents.
Our objective was straightforward: push the frontier of dedicated ASR model accuracy under practical conditions. The model was trained from scratch with a deliberate focus on minimizing word error rate (WER), while keeping production readiness top-of-mind. In other words, not just a research artifact, but a system designed for everyday use.
Cohere Transcribe reflects that intent. It is available for open-source use with full infrastructure control, maintains a manageable inference footprint suitable for practical GPU and local utilization, delivers best-in-class serving efficiency, and is also available via Model Vault — Cohere’s secure, fully managed model inference platform.
Cohere Transcribe currently ranks #1 for accuracy on HuggingFace’s Open ASR Leaderboard, setting a new benchmark for real-world transcription performance.
This marks our zero-to-one in bringing high-performance speech recognition into enterprise AI workflows. Read on to learn more.
Model overview
Name: cohere-transcribe-03-2026
Architecture: conformer-based encoder-decoder
Input: audio waveform → log-Mel spectrogram
Output: transcribed text
Model size: 2B
Model: a large Conformer encoder extracts acoustic representations, followed by a lightweight Transformer decoder for token generation
Training objective: standard supervised cross-entropy on output tokens; trained from scratch
Languages: trained on 14 languages:- European: English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish
- APAC: Chinese (Mandarin), Japanese, Korean, Vietnamese
- MENA: Arabic
License: Apache 2.0
Image 1: Cohere Transcribe is an open-weights Conformer ASR model converting speech audio into text across 14 supported languages.
Model performance
Accuracy
Cohere Transcribe is the latest standard for English speech recognition accuracy. It leads the HuggingFace Open ASR Leaderboard with an average word error rate of just 5.42%, outperforming all open- and closed-source dedicated ASR alternatives, including Whisper Large v3, ElevenLabs Scribe v2, and Qwen3-ASR-1.7B. This captures the model’s versatile capability across real-world speech tasks, such as robustness to multiple-speaker environments, boardroom-style acoustics (e.g. AMI dataset), and diverse accents (e.g. Voxpopuli dataset).
[Table of model WER scores omitted for brevity]
Image 2: the Hugging Face Open ASR Leaderboard as of 03.26.2026. This is a widely used, standardized benchmark evaluating automatic speech recognition systems across curated datasets using word error rate (WER) as the primary metric, computed over normalized reference-hypothesis alignments, where lower WER indicates higher transcription fidelity. See the live leaderboard here.
Critically, these gains aren’t limited to benchmark datasets. We see the same state-of-the-art performance carried over into human evaluations, where trained reviewers assess transcription quality across real-world audio for accuracy, coherence, and usability. Consistency across both evaluation methods reinforces that Cohere Transcribe’s performance translates reliably from controlled tests to practical enterprise settings.
Image 3: human preference evaluation of model transcripts in English. In a pairwise comparison, annotators were asked to express preferences for generations which primarily preserved meaning - but also avoided hallucination, correctly identified named entities, and provided verbatim transcripts with appropriate formatting. A score of 50% or higher indicates that Cohere Transcribe was preferred on average in the head-to-head comparison.
Image 4: human evaluation of ASR accuracy for a selection of supported languages. A score of 50% or higher indicates that Cohere Transcribe was preferred on average in the head-to-head comparison.
Throughput
In production settings, ASR systems must operate under strict latency and throughput constraints; even if accurate, slow or resource-intensive transcription can directly impact user experience, operational efficiency, and cost.
Transcribe extends the Pareto frontier, delivering state-of-the-art accuracy (low WER) while sustaining best-in-class throughput (high RTFx) within the 1B+ parameter model cohort.
Image 5: throughput (RTFx) vs accuracy (WER) plot for leading models larger than 1B in size. RTFx (real-time factor multiple) measures how fast an audio model processes its input relative to real time.
“We’re genuinely impressed with what Cohere has built with Transcribe. The speed is exceptional — turning minutes of audio into usable transcripts in seconds — and it immediately unlocks new possibilities for real-time products and workflows. In our testing, the model handled everyday speech very well and delivered strong, reliable transcription quality. The overall experience has been smooth and easy to work with. We’re excited to be partnering with Cohere and to continue exploring what we can build with this technology.”
— Paige Dickie, Vice-President, Radical Ventures
Zero to one, and beyond.
We are working towards deeper integration of Cohere Transcribe with North, Cohere’s AI agent orchestration platform. With planned updates, Cohere Transcribe will evolve from a high-accuracy transcription model into a broader foundation for enterprise speech intelligence.
Getting started.
Cohere Transcribe is now available for download on Hugging Face. Follow the setup instructions to run the model locally, or even in edge environments.
You can also access Cohere Transcribe via our API for free, low-setup experimentation subject to rate limits. See the documentation for usage details and integration guidance.
For production deployment without rate limits, provision a dedicated Model Vault. This enables low-latency, private cloud inference without having to manage infrastructure. Pricing is calculated per hour-instance, with discounted plans for longer-term commitments. Contact our team to discuss your requirements.
Key contributors: Julian Mack (Member of Technical Staff), Ekagra Ranjan (Member of Technical Staff), Cassie Cao (Product Manager), Bharat Venkitesh (Manager of Technical Staff), Pierre Harvey Richemond (Manager of Technical Staff).
Original source - Jan 28, 2026
- Date parsed from source:Jan 28, 2026
- First seen by Releasebot:Jun 1, 2026
Introducing Model Vault: Your private platform for secure and scalable model inference
Cohere launches Model Vault, a fully isolated SaaS platform for secure, high-performance model serving and scaling. It offloads inference operations, adds real-time monitoring, and supports flexible deployment for North and standalone customers.
Key Contributors
Manoj Govindassamy, Maxime Brunet, Jeremy Pekmez, Inna Shteinbuk, Elliott Choi
Model Vault simplifies serving and scaling Cohere models, so teams can focus on building, not infrastructure.
Cohere is launching Model Vault, a dedicated, fully isolated SaaS platform for customers to run Cohere models securely, at scale, and with guaranteed performance.
Model Vault combines the reduced operational overhead of fully managed SaaS with the security and performance advantages of self-hosting. North users deploy their application within a secure VPC while offloading the maintenance-heavy model inference and scaling to their secure, cloud-based Model Vault. The result: lower cost of ownership and faster enterprise adoption.
Model Vault marks a milestone in Cohere’s journey to deliver transformative AI that solves the real-world challenges faced by enterprise customers today. Read on to learn more about what’s on offer, or get started now.
The convenience-control tradeoff
At the heart of enterprise AI transformation lies a fundamental constraint: how to balance speed and scale in adoption with the infrastructure control and compliance requirements of a mature, modern-day organization.
This is increasingly a problem of inference. As enterprises work to incorporate models into more workflows, teams, and products, inference is gradually replacing training as the dominant AI workload. McKinsey now projects that inference will account for the majority of AI compute by the end of the decade, even as per-unit costs continue to fall. For businesses, this shifts the core concern from funding episodic training runs to accounting for inference — a recurring cost that scales directly with increased adoption.
For most enterprises, available deployment options clearly reflect this tension. Multi-tenant SaaS platforms optimize for speed and operational simplicity by amortizing infrastructure costs across multiple customers. However, the lack of workload isolation inherent to shared environments introduces problems like noisy-neighbour effects, enforced rate limits during peak usage, and unpredictable latency. These platforms also tend to limit model configurability, provide little visibility into workload-level performance, and often fall short of the compliance standards of highly regulated enterprises.
Self-hosted deployments — whether on-premises or within a customer-managed VPC — address many of these limitations by offering greater control over infrastructure, performance characteristics, and security boundaries. The capital and operational burden underpinning that control, however, can be prohibitively costly, particularly for teams looking to scale.
With North, Cohere’s enterprise platform for building agentic AI applications, this problem of infrastructure becomes even more acute. Model serving still requires provisioning and managing hardware, but agentic workloads by their design are bursty, multifaceted, and unpredictable.
A new framework for scalable, secure AI deployment
Model Vault is the latest addition to Cohere’s suite of secure deployment options. It is designed to address this tradeoff between convenience and control by transferring the operational overheads of model inference to a secure, Cohere-managed cloud environment.
The result is a SaaS deployment model that removes the constraints of shared infrastructure, such as resource contention and unpredictable performance, while preserving the speed, elasticity, and ease of use associated with multi-tenant SaaS.
To be clear, there is no single deployment approach that fits every organization. For some enterprises, multi-tenant SaaS or self-hosted deployments will remain the right choice, depending on internal capabilities, data strategy, and regulatory requirements. Industries with highly standardized workflows, for example, may meet compliance needs through application-level controls alone and continue to favor shared SaaS platforms.
But for many enterprises, infrastructure management has become a growing barrier to production-grade agentic AI. They want to scale model inference capacity without scaling operational load. Model Vault is built for these teams.
Decouple inference from development
Model Vault is a Cohere-managed solution. That means we assume the full operational burden of production inference: deploying models, managing upgrades and dependencies, provisioning and scaling capacity, and ensuring performance and availability. We deliver Model Vault users 99.9%+ guaranteed availability, backed by production-grade SLOs and latency performance that matches or beats leading managed AI/ML deployment platforms.
This materially lowers the total cost of ownership by removing the need for customers to procure, provision, and operate GPU-backed inference infrastructure. North deployments can now run entirely on CPU-only environments, while the highest overhead ML infrastructure work is done within your Model Vault.
Model Vault lets ML teams scale workloads elastically without pre-allocating capacity or absorbing the risk of idle GPUs. Inference capacity expands and contracts with demand, eliminating the tension between overprovisioning for peak performance and underprovisioning that degrades latency and availability.
In simple terms, Model Vault lets Cohere absorb the operational complexity of model serving, freeing your most valuable asset — your engineers — to focus on moving agentic AI applications from experimentation to production. Inference is not your differentiator.
Throughout, enterprise control is retained where it matters most. North customers keep full ownership of the control plane, covering agent logic, workflow orchestration, conversation states, data storage, pipelines. They have full architectural flexibility over which data and workloads they choose to host on-prem or on their existing VPCs, and which can be handled by Model Vault.
No sharing. No limits. No surprises.
Model Vault is deployed as a logically isolated virtual private cloud. No infrastructure components are shared across customers, ensuring strong isolation for performance, reliability, and security. This includes dedicated network load balancers, reverse proxies, serving middleware, inference servers, and the underlying GPU accelerators.
Teams can now use Model Vault to run unlimited production workloads without competing for scarce capacity. The resources you provision are reserved for you — no more rate limits to maintain platform stability. At the same time, Model Vault dynamically scales inference capacity according to customizable logic, so performance remains consistent even for bursty agentic inference.
Observe and optimize in real time
Model Vault gives MLOps teams unique visibility into how inference workloads behave in production. Through a dedicated, real-time dashboard, teams can track request patterns, latency, token throughput, and resource utilization, enabling them to quickly pinpoint bottlenecks and continuously optimize for efficiency, predictability, and performance as demand evolves.
Setting up a new Vault is quick and easy. Select your desired model and performance tier (S, M, L, XL), the replica range, and click Create Vault. You can now run Cohere models in your system.
Manage your Vault directory and all deployed models from your Cohere dashboard.
Model Vault also equips teams with real-time monitoring of model usage and performance.
Getting started
Model Vault is designed for quick and intuitive onboarding, with minimal manual setup. It can be integrated into your existing North deployment, or serve as a private inference platform for Cohere models used individually.
Here’s how to create your Model Vault in minutes:
Without North:
- Go to dashboard.cohere.com. Create an account and log in.
- Find Vaults in the side bar, then select “+ New Vault.”
- Name your Vault.
- Select your Cohere model, and the performance tier.
- Specify the desired range of replicas for each selected model. This defines your scaling limits.
- Confirm your Vault.
That’s it! Your infrastructure is provisioned and you’ll instantly receive your unique subdomain, API endpoint, and access to your private monitoring dashboard.
With North:
- Follow the steps above to create a dedicated Model Vault for your organization.
- Make sure you select the models that you need for your North deployment.
- Once you have your API endpoint, update the Helm chart in your North deployment, so that the desired models point to that endpoint.
- After setup, customers can manage the primary and secondary inference endpoints for Command models in North directly from the admin interface.
- Your North deployment is now configured with Model Vault.
Supported models
Model Vault currently supports all of Cohere’s latest state-of-the-art embedding, reranker, and generative models. This includes bundled offerings for our North customers (see table below).
What’s more, Rerank and Embed models, which underpin Cohere’s state-of-the-art AI search and retrieval capabilities, can also be self-served through your Model Vault. That means on-demand access for seamless setup and faster experimentation. Customers interested in self-serve access to our Command models or integrated model bundles can request to join the waitlist.
Model Vault pricing plans are flexible and can be tailored to your team’s workload requirements and budgeting preferences. Speak with one of our team to find what works best for you.
You can also read our documentation for full, step-by-step instructions, as well as our model specs and how-to guides for building agentic AI applications in North.
Build fast. Stay in control.
Start shipping secure, high-performance enterprise AI with Model Vault today. Get started.
Original source - Dec 11, 2025
- Date parsed from source:Dec 11, 2025
- First seen by Releasebot:Jun 1, 2026
Introducing Rerank 4: Cohere’s most powerful reranker yet
Cohere releases Rerank 4, its most advanced reranker for enterprise AI search, with stronger retrieval relevance, lower latency, 32K context, multilingual support across 100+ languages, self-learning customization, and flexible deployment on Cohere Platform, SageMaker AI, and Microsoft Foundry.
Key Contributors
Clifton Poth (Member of Technical Staff), Fabian David Schmidt (Member of Technical Staff), Martin Hentschel (Member of Technical Staff), Daniel Simig (Manager of Technical Staff), Nils Reimers (VP of Embeddings & Search), Elliott Choi (Director of Product)
Rerank 4 is the most advanced set of reranker models available today, purpose-built to meet the realities and challenges of enterprise AI search. It delivers best-in-class retrieval, outperforming the likes of MongoDB’s Voyage models and ElasticSearch’s Jina rerankers in overall search relevance, as well as improved latency, flexible deployment options, deep customizability and robust multilingual performance. Designed for business-critical applications across key industries and domains, Rerank 4 sets a new standard for accuracy and adaptability in enterprise search.
Why reranking matters
Rerankers significantly enhance the accuracy of enterprise AI search by refining initial retrieval results. After a fast but broad candidate-generation step using methods like BM25 or bi-encoder embeddings, the system is left with documents, passages, or product listings that approximate relevance but often miss the nuance of the user’s intent. Rerank 4 addresses this gap using a cross-encoder architecture that processes queries and candidates jointly, capturing subtle semantic relationships and reordering results to surface the most relevant items. This approach balances speed and accuracy, delivering high-quality results without evaluating the entire corpus.
Rerank 4 is a key component of North, Cohere’s agentic AI platform, which combines intelligent search (Embed, Rerank), large language models (our Command series), and customizable AI agents to automate tasks and accelerate decision-making. It integrates seamlessly into existing AI search solutions, including hybrid, vector, and keyword-based systems, with minimal code changes. Beyond improving retrieval-augmented generation (RAG) pipelines, Rerank 4 is critical in agentic AI systems, providing distilled, high-quality information for stronger reasoning and context-aware agent performance. By filtering out irrelevant content before it reaches the generative model, Rerank 4 reduces token usage and minimizes the number of costly retries an agent might otherwise need to “get it right,” sparing the user both pennies and latency. This is especially impactful for agentic AI, where complex, multi-step interactions can quickly drive up model calls and saturate context windows.
Rerank 4 Fast & Pro
Rerank 4 boasts a 32K context window - the largest of our Rerank series to date and a four-fold increase over the previous generation. This enables the model to handle longer documents, evaluate multiple passages simultaneously, and capture relationships across sections that shorter windows would miss. This expanded capacity therefore improves ranking accuracy for realistic document types and increases confidence in the relevance of retrieved results.
To support different search workloads, Rerank 4 is available in two versions. Fast is a smaller model designed for use cases where both speed and accuracy are important. Pro, meanwhile, is optimized for tasks that require deeper reasoning, analysis and ironclad precision.
Rerank 4 Fast - where speed wins:
- E-commerce. Prospective customers benefit from high relevance product recommendations as well as the option to explore online catalogues with semantic search. Here, faster results equal a better shopping experience and higher conversion rates.
- Programming. During a project, engineers need to quickly and conveniently reference documentation - like design specs - without derailing their creativity or collaboration.
- Customer service. Enterprise help desk agents need to efficiently triage incoming tickets to resolve cases, prevent backlogs, and keep their customers happy.
Rerank 4 Pro - where depth matters:
- Finance. When generating risk models, analysts rely on troves of market reports, regulatory filings, and transaction histories. More retrieval time means richer data analysis and more accurate scenarios modeled.
- Healthcare. Clinicians routinely review patient records and clinical trial reports to identify appropriate courses of treatment. Accuracy and completeness of information can have devastating consequences for patient welfare.
- Manufacturing. Quality engineers consult technical manuals and production logs to diagnose deviations in highly complex processes that led to product defects. More informed decision-making helps productivity, safety, and end customer satisfaction.
Superior performance in enterprise-specific and multilingual tasks
Our customers need to know that their semantic search is not just performant, but works for the queries, workflows, languages, and data types that constitute their everyday work. This means the model has to understand the nuances of each user and the problem they're trying to solve.
Benchmarking against industry standards confirms that Rerank 4 outperforms all current competitive alternatives in a broad array of enterprise domains, such as finance, healthcare, manufacturing and others.
Rerank 4 also leads the pack in multilingual performance. Users can deploy Rerank in over 100 world languages, including state-of-the-art retrieval in 10 major business languages (Arabic, Chinese, French, German, Hindi, Japanese, Korean, Portuguese, Russian, and Spanish).
Self Learning
Semantic search is an investment for your company. Like any human-executed task, semantic search will see greater returns once it ‘learns’ the means to do its job more efficiently. On designing Rerank 4, we asked ourselves: how can we bake this concept into our product offerings?
Rerank 4 is our first reranking model with self-learning capability. In partnership with Cohere, users can customize their model for certain use cases - such as those they encounter most frequently - without the need for further annotated data. This might involve stating preferences for particular content types, for use of prescribed language or terminology, or directing the model to specific document corpora.
An example: imagine a Consumer Loan Specialist at a commercial bank. Their job is to evaluate borrowing applications and guide customers through the process of acquiring a loan. Each case requires frequent references to internal documentation relating to, say, approval criteria, product details, regulatory standards, or even company policies. Fast retrieval is key for customer satisfaction - but accuracy can’t be compromised either. At least not within this narrow domain.
Rerank 4 moves towards overcoming this tradeoff in speed and quality through Self Learning. With automated adaptation to domain-specific use cases, the accuracy of Rerank 4 Fast iteratively improves to become competitive with much larger market alternatives. In many of our tests, Rerank 4 Fast began to converge and even surpass out-of-the-box Pro performance. Self-Learning Rerank will soon bring continuous RAG optimization to North, advancing precision and robustness in domain‑specific workflows.
Looking further, we also explored how Rerank 4’s self-learning capability performs on entirely new search domains. Using healthcare-focused datasets that mimic a clinician’s need to retrieve patient-specific information - not just expertise from a given medical discipline - we found that enabling Self Learning produced consistent, substantial gains. The result: a clear and significant boost in retrieval quality for Rerank 4 Fast, across the board.
Getting started
Rerank 4 is now available on Cohere’s Platform, Amazon SageMaker AI (Fast & Pro) and Microsoft Foundry, with additional platform support coming soon.
Click here for pricing options, and consult our docs for learn more about Rerank 4 setup.
Rerank 4 can also be deployed into any Virtual Private Cloud (VPC) or on-premise environment. To learn more about options for large enterprise deployments, please contact our sales team.
Original source - Aug 28, 2025
- Date parsed from source:Aug 28, 2025
- First seen by Releasebot:Jun 1, 2026
Command A Translate: Secure translation for global enterprises
Cohere introduces Command A Translate, a secure enterprise translation model with private deployment options, fine-tuning support, and coverage across 23 business languages. It also adds Deep Translation, an agentic multi-step approach for even higher-quality results.
The new industry standard for secure, enterprise-ready machine translation.
Today, we’re introducing Command A Translate, a state-of-the-art model specifically designed for high-quality translation tasks. It consistently outperforms all other models, including GPT-5, DeepSeek-V3, DeepL Pro’s LLM, and Google Translate. We can also push its quality even higher through Deep Translation, our new agentic approach that uses a multi-step process to refine translations.
Command A Translate delivers industry-leading performance while offering enterprises full control of their data through private deployment options. Translation frequently involves handling sensitive business documents, such as contracts, financial reports, and materials with confidential customer information that shouldn’t be exposed to consumer services. For global companies operating across multiple languages, it ensures translations remain precise, reliable, and highly secure.
Deep Translation Elevates Leading Translation Quality
Command A Translate is built for enterprises that prioritize high-quality translations to reduce rewrites, lower costs, and enable teams across geographies to work together seamlessly. The model excels at key translation benchmarks, including speech, news, social media, and literary text. It achieves best-in-class results across a wide range of business languages.
On the WMT24++ dataset, Command A Translate achieved a best-in-class xCometXL score of 83.8, and a score of 84.4 with Deep Translation. This shows the average xCometXL score for the 23 languages covered by Command A Translate on the English to L2 datasets of WMT24++. *DeepL Pro does not cover Hindi and Persian and the numbers for those languages were estimated through nearest neighbor imputation.
Performance can be further enhanced through our innovative agentic approach: Deep Translation. Deep Translation achieves unmatched quality through iterative reasoning for more complex translation use cases. It allows the model to gradually refine translations through multiple steps improving fluency and naturalness, ensuring the output reads as if originally written in the target language. This is especially valuable for use cases with the highest quality bar, like translating legal documents. Deep Translation is available today by request, reach out to learn more.
To meet the needs of global enterprises, the model supports translation across 23 widely used business languages: English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, Chinese, Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian.
Privately deployable and fine-tunable
For enterprises needing to translate sensitive information without sending it to an external service, Command A Translate can be privately hosted behind your secured firewalls. This allows businesses to break down internal language silos while maintaining complete control over their data.
Due to its efficient design, Command A Translate can be deployed with minimal hardware requirements. For low footprint deployments, enterprises can bring it to production on one GPU (H100/A100) using 4-bit quantization without a noticeable decrease in translation quality (less than 0.5 xCometXL points).
For those organizations with unique needs, from industry specific translations to supporting new languages, we offer fine-tuning and customization support for production use cases.
Why secure translation matters
Enterprises rely on translation for some of their most sensitive and business-critical documents. They cannot risk data leakage, compliance violations, or misunderstandings. Mistranslated documents can reduce trust and have strategic implications. Entire industries depend on translation as a core service, from localization providers to companies producing subtitles, advertising, and news at a global scale.
At Cohere, we use Command A Translate internally to ensure we have the highest quality multilingual data possible. Early testers like leading language and content solutions company RWS have confirmed Command A Translate’s strong performance across complex translation tasks.
"RWS and the Language Weaver team have been working closely with Cohere to evaluate their Command A Translate model, optimized for automated translation tasks. We rigorously tested it across 23 languages and multiple domains, comparing it against other leading providers using both automated metrics and human reviews from RWS’s professional linguists. Command A Translate consistently demonstrated world-class quality in every comparison, seamlessly handling complex challenges like numbers, names, and polysemous words with precision. We look forward to continuing this collaboration to deliver innovative AI-powered translation applications for customers." – Dragos Munteanu, Vice President of Research and Development, RWS
Availability
Command A Translate is available today on the Cohere platform and for research use on Hugging Face. If you are interested in private or on-prem deployments, please contact our sales team for bespoke pricing.
Original source - Aug 21, 2025
- Date parsed from source:Aug 21, 2025
- First seen by Releasebot:Jun 1, 2026
Command A Reasoning: Enterprise-grade control for AI agents
Cohere introduces Command A Reasoning, its most advanced enterprise reasoning model, now available on the Cohere platform and Hugging Face. It brings secure, scalable deployment, long-context support, configurable token budgets, and stronger performance for agentic workflows and deep research.
Today, we’re introducing Command A Reasoning, our most advanced model for enterprise reasoning tasks. From agentic workflows to end-to-end systems, it outperforms leading privately deployable models in its class, including gpt-oss-120b, DeepSeek-R1 0528, and Mistral Magistral Medium.
Command A Reasoning is purpose-built for enterprise needs, offering highly secure, efficient, and scalable deployment options. For low footprint deployments, it can run on a single H100 or A100 with a context length of 128k. For latency optimized deployments on two or more GPUs the context length scales to 256k. This configurability ensures organizations can make the most of the hardware available to them. The model's long context length makes it ideal for document-heavy workflows and complex multi-step agentic use cases.
Customers can set a token budget to directly manage compute usage and control costs. This eliminates the need to maintain separate reasoning and non-reasoning models, since Command A Reasoning can be used for tasks which require maximum accuracy or configured for greater speed and throughput when efficiency is the main priority.
Command A Reasoning is the core generative model powering North, our secure agentic AI platform. This empowers organizations to deploy custom AI agents and automations on-premises, backed by our most capable reasoning model.
Strong performance across enterprise reasoning tasks
Agentic and multilingual reasoning benchmarks
Command A Reasoning delivers leading results across key agentic benchmarks. It also performs strongly across a range of important business languages, enabling global enterprises to leverage agents with consistent quality.
Performance evaluated on 3 Agentic benchmarks (higher is better). BFCL-v3 measures a wide range of general single and multihop tool-use agent capabilities. Tau-bench measures performance on two enterprise-relevant customer service agent use-cases, in retail and airline domains, and we measure multilingual capabilities with a carefully constructed in-house translation of tau-bench into 5 languages. Command-a-reasoning outperforms R1-0528, gpt-oss-120b and magistral-medium in all cases.[1]
Exceptional deep research agent
Command A Reasoning excels at powering end-to-end systems involving chained and hierarchical agents and leveraging the most relevant tools to accomplish tasks. A great example is our Deep Research system, which outperforms similar capabilities from all other leading AI labs.
Score on the DeepResearch Bench RACE evaluation for our deep research agent on the English question set (higher is better). This measures the overall quality of a long research report generated by a deep research system in terms of instruction following, readability, insight and comprehensiveness. Our system performs the best overall, highlighting its ability to generate strong research reports.[2]
Deep Research is designed to tackle the complex, in-depth questions that demand more than a quick search. It delivers detailed, well-sourced reports in minutes, which would typically take an employee hours. Our system uses a multi-agent architecture, powered by Command A Reasoning that breaks down the user request into smaller research topics. Then, multiple AI agents work in parallel, searching and analyzing information from a wide array of sources. Finally, the system consolidates the verified findings into a single, well-structured report that directly addresses the original user request. Deep Research is coming soon to North.
Powering North, Cohere’s flagship enterprise AI platform
North enables enterprises that prioritize data security to deploy AI agents and automations at scale within their own infrastructure. On human evaluation scores for a representative set of daily tasks at work, Command A Reasoning consistently outperforms Command A. For customers, these gains translate into more reliable agents, reduced manual intervention, and delivery of highly accurate, actionable results.
The plot shows the average answer satisfaction scores for North when powered by command-a and our new command-a-reasoning model on a wide range of common, but complex, enterprise productivity tasks (higher is better). For these tasks, North needs to successfully understand and navigate multiple sources of Cohere’s real enterprise data (Slack, Google Drive, company emails and web search). The satisfaction score reflects both the accuracy and response quality of the agent. Command-a-reasoning’s enhanced reasoning allows it to better understand a user’s intentions and carry out complex multistep tasks in the enterprise environment, outperforming command-a in all these categories, usually by a wide margin.[3]
Control and efficiency required for enterprise scale
Command A Reasoning unlocks business applications across industries, combining strong performance, high accuracy, and scalable efficiency. Its low hardware requirements make it practical for private deployments on a single H100 or A100.
The model optimizes compute usage and costs through a user-controlled token budget, enabling seamless adjustment between mission-critical precision and high-throughput tasks. This eliminates the need for separate reasoning and non-reasoning models, allowing enterprises to maximize GPU efficiency and dynamically allocate resources.
In North’s internal evaluations, adjusting the reasoning budget shows a smooth progression in performance from efficient responses at zero reasoning to more in-depth responses at higher reasoning levels. Even with zero reasoning enabled, Command A Reasoning outperforms Command A.
North agent performance on a set of difficult questions requiring multi-hop search (higher is better). Our previous model, Command-a, is shown in pink. Command-a-reasoning has been trained to excel in North, and even with reasoning turned off (the ‘instant’ bar), comfortably outperforms command-a. However, if we allow command-a-reason to think, we see that it improves even more. With more and more thinking time, command-a-reasoning improves further and further. This allows you to fine-grained control over the right amount of latency vs performance for your organization while still only deploying one model.[4]
Safe in high-stakes environments
Safety is foundational to how we train and evaluate all our models, including Command A Reasoning. This means striking the best balance between ensuring the model doesn’t over-refuse valid requests, and preventing the purposeful propagation of harmful and malicious content online. We focus on five key areas: Child sexual exploitation and abuse (CSEA), self-harm, violence and hate, sexual content, and conspiracy theories.
Cohere internal evaluations show Command A Reasoning strikes the strongest balance among competitors between safety and usefulness. The higher the better for Absolute Safety. The lower the better for Over-Refusal.
Availability
Command A Reasoning is available today on the Cohere platform and for research use on Hugging Face. If you are interested in private or on-prem deployments, please contact our sales team for bespoke pricing.
“We’re excited to partner with Cohere to integrate their Command A Reasoning model into our generative AI hub in SAP Business Technology Platform. This powerful model will enhance SAP’s generative AI capabilities, empowering customers, partners, and developers to build innovative, secure agentic applications tailored to their unique environments. Together, we’re unlocking new possibilities for enterprise reasoning tasks.” — Dr. Walter Sun, SVP and Global Head of AI, SAP SE
We run all models with their highest available reasoning setting.
Original source
On BFCL, We evaluate command-a-reasoning using the Function Calling (FC) setting. For competitors, we report their score on the official BFCL leaderboard if available, or otherwise benchmark them using the official BFCL codebase, reporting the highest of their prompted and FC evaluation settings.
On Taubench, we report the average of the pass^1 score over 10 runs for command-a-reasoning. For competitors, where available, we take the officially reported numbers (R1-0528, GPT-oss) otherwise we run the tool-use api against the official tau-bench implementation, reporting the average pass^1 over 10 runs. In all cases, we report the average of airline and retail.
On M-taubench, we run all models against and report the average score across Ja, Ko, Ar, Es, Fr, and En for both retail and airline, across 10 runs.
Our deep research agent is a hierarchical web-search agent which breaks a problem down into multiple subproblems, and recursively researches each with a sub-agent. When all subagents have finished, we take their subreports and generate a final report with a number of steps of iterative refinement, making sure our final report is strong. We use an internally developed agent and web search, leveraging our expertise in retrieval, reranking, retrieval-augmented generation and agents. We report the RACE scores for competitor models on the english questions by rerunning the official RACE evaluation on their english deep research reports on the DeepResearchBench leaderboard website.
Annotators were asked to rate their satisfaction score based on a comprehensive internally-developed rubric, summarizing aspects of response quality, appropriateness, correctness and latency. The total evaluation set is composed of 112 questions, and each answer is annotated by 6 annotators before we aggregate their scores.
This evaluation uses an automatic correctness score utilizing LlamaIndex, with respect to known correct reference answers. - Aug 6, 2025
- Date parsed from source:Aug 6, 2025
- First seen by Releasebot:Jun 1, 2026
Introducing North: The next era of enterprise AI
Cohere launches North, a secure enterprise AI platform for private deployment of agents and automations at scale. It brings chat, search, custom tools, workflow automation, and deep data integrations to on-prem, VPC, hybrid, or airgapped environments with strong governance and compliance.
North enables enterprises that prioritize data security to deploy AI agents and automations at scale within their own infrastructure.
We believe AI has the potential to eliminate most mundane tasks from daily work. To achieve this, we built North, an agentic AI platform that securely accesses all of the data you use in your work, because AI is only as good as the data you give it. By enabling private deployment, we give companies the confidence to put their data into AI, with the efficiency to offer a cost of ownership that makes sense at scale.
We are expanding from serving a select group of customers to making North widely available, building on our work over the past few months. Leading organizations such as RBC, Dell, LG CNS, stc, Ensemble Health Partners, Second Front, and most recently Bell are seeing the transformative impact of secure AI agents developed in North. These agents are being deployed across critical industries that underpin the global economy, including finance, healthcare, manufacturing, telecommunications, energy, and the public sector.
North combines Cohere's state-of-the-art generative and search models, customizable agents, and built-in workflow automations to enable employees to accomplish tasks more quickly and with higher quality. It seamlessly integrates with data across fragmented systems and internal/external services, providing a comprehensive understanding of your unique business context. By securely consolidating all critical information in one place, North unlocks the most high-value production applications that will drive meaningful productivity and efficiency gains for employees and customers. It's built on a robust, scalable architecture with privacy and security at its core–using as few as two GPUs.
Equipping the modern global workplace
North equips modern businesses with a full suite of enterprise-ready capabilities, allowing employees to achieve more right out-of-the-box. Employees across functions can drastically reduce manual effort by creating, deploying, and orchestrating custom AI agents and automations to collaboratively complete tasks with human oversight and transparency. Any user, regardless of technical skills, can build agents that can handle repetitive tasks, analyze data, and take action across systems.
Core features include:
- Chat & search: quickly get reliable answers to customer support inquiries, summarize meeting transcripts, punch up marketing copy, and access information from both internal sources (like HR or finance policies) and the web. All responses include citations and reasoning chains of thought for auditability and verification.
- Custom tools and integrations: seamlessly connect to your existing workplace data sources, including Gmail, Slack, Salesforce, Outlook, Sharepoint, Linear, NFS/NAS, and integrate with any Model Context Protocol (MCP) server for secure access to industry-specific or in-house applications.
- Asset creation: draft and refine documents, including PRDs, financial reports, market research, and sales pipeline reports, adhering to your style guide and formatting requirements. Analyze hundreds of documents in parallel like due diligence files with AI tables.
- Automated workflows: design and deploy multi-agent workflows tailored to your team’s tools, processes, and objectives. Automate everything from repetitive tasks like CRM updates and weekly roundups to full-scale business processes, freeing your team to focus on high-impact work.
North unlocks faster, smarter decisions at scale. By connecting your siloed enterprise data and tools through secure AI agents, we empower every employee to find answers, reduce time-to-insight, and create sharable assets with ease.
Security-first design and deployment
North provides a secure platform where organizations can confidently build and deploy AI agents behind their own firewalls, ensuring data privacy and total control. Its lightweight architecture is optimized for private deployments with minimal hardware—as few as two GPUs—ensuring resource-efficient performance across your on-premise infrastructure, hybrid clouds, VPCs, or airgapped environments.
This level of security is non-negotiable for industries and government agencies where data privacy and regulatory compliance are paramount. The increasing global demand for sovereign AI solutions highlights the critical importance of robust security and privacy measures.
Enterprise-grade protocols are embedded into every layer:
- Granular access controls and permissions: integrates with your existing identity and access management systems, providing enhanced authentication and granular admin controls. This ensures that only authorized users can access specific agents, tools, and sensitive data sources.
- Autonomy policies: agents can only take actions they are authorized to take, and will seek human oversight for critical decisions or actions.
- Rigorous security testing: continual security testing, including red-teaming exercises and third-party vulnerability scanning, to proactively identify and mitigate potential threats.
- Complete system observability: gain full visibility into North's operations with easy-to-use tools for monitoring service performance and detailed logs of all changes to settings, permissions, and resources.
- Flexible deployment options: select the deployment environment that best suits your security needs (self-hosted, VPC, hybrid, or on-premises).
- Compliance with global standards: adheres to the highest international security standards, including GDPR, SOC 2, ISO 27001, and ISO 42001, ensuring compliance with regulations across industries.
Fully customizable for your business
North is purpose-built to meet enterprises where they are, across industry, geography, and language. Its vertical integration allows for complete customization in every layer of the tech stack, from domain-specific models to white-labeled user experiences.
Enterprises can further enhance North by connecting to internal knowledge bases or proprietary data, ensuring employees can easily and quickly find value in their daily work. Within North, users can create personalized agents and workflow automations tailored to their specific roles, leveraging the full context of their company to execute complex tasks.
Organizations can control who gets access to which features, which agents and automations, and sharing is configurable down to the relevant user or teams. This level of customization and adaptability gives enterprises fine-grained control over their AI solutions, optimizing North for their specific needs.
Putting North to work
Global customers and partners across security-conscious industries are already realizing the benefits of agentic AI applications powered by North.
In finance, Cohere and RBC developed North for Banking, a customized North configuration for the financial services industry. It is designed to support RBC’s diverse employee base and enhance customer service. North provides departments across the bank with AI agents that seamlessly integrate with RBC systems to increase productivity and operational efficiency while keeping data on-premise and secure. The platform features pre-built workflow automations designed to assist financial professionals and customer support teams with a range of tasks. Employees can efficiently summarize company reports, draft emails, retrieve internal information, and format it into visually appealing graphs and charts.
“RBC is excited to deepen our collaboration with Cohere,” said Dr. Foteini Agrafioti, SVP, Data & AI & Chief Science Officer, RBC. “Six months ago, we jointly announced a customized platform, North for Banking, to enable RBC to accelerate the development of our genAI solutions securely and efficiently and we’re pleased with our results to date. As our collaboration continues, we welcome the opportunity to use North for Banking to enable AI to unlock value across the enterprise.”
In South Korea, LG CNS leverages North with customized Cohere models internally for employees and to better serve customers in industries like finance and the public sector. LG CNS can securely deploy multilingual AI agents with North, which features advanced Korean language proficiency and the ability to handle specialized financial terminology. Our joint efforts have already secured an important government contract with South Korea’s Ministry of Foreign Affairs.
“As a strategic partner, LG CNS is proud to endorse Cohere's North agentic platform. North represents a pivotal leap forward, enabling the development of highly secure, specialized AI agents tailored to the complex demands of the enterprise market,” said Yohan Jin, Head of AI Center, LG CNS. “Internally, we are leveraging North to enhance our own Agentic AI platform, which has accelerated our ability to deliver sophisticated, fine-tuned AI solutions to our clients. In the Korean financial market, the response from major players has been exceptionally positive. The launch of North is more than just a product release; it's an enabler of true business innovation. We are confident that our continued partnership with Cohere will empower us to lead the market by co-developing the next generation of AI agents that solve critical enterprise challenges.”
We’re also partnering with Dell to bring North’s agentic capabilities to their workforce and enterprise customers on-premises through Dell AI Factory infrastructure. By bringing North directly to your data, we help overcome common barriers to adoption like security, privacy, and overly complex implementations. With Dell's highly secure and scalable infrastructure, employees and customers can seamlessly integrate custom AI agents into their operations at scale, unlocking new levels of efficiency and innovation.
“Together Dell Technologies and Cohere are offering on-premises solutions that secure our customers' data and offer an AI advantage. By reducing barriers to adoption and streamlining workflows – including building and deploying agentic AI, we are providing the foundation needed to thrive in a data-driven world,” said Caitlin Gordon, Vice President, Dell Technologies. “By integrating Cohere’s North platform into the Dell AI Factory, enterprises can automate workflows across core business functions like finance, sales and customer support.”
Driving the next era of enterprise AI
Enterprises have moved beyond proof-of-concept projects and experimentation with isolated, off-the-shelf consumer AI tools. Now organizations require fully end-to-end, customized, and privately deployable solutions that protect their data.
North meets this demand with a production-ready, full-stack AI platform. The platform removes the primary barriers to deployment, strengthens governance, and ensures every component works together seamlessly. Even organizations traditionally slow-to-adopt new technologies can confidently embrace AI with North, knowing their solutions are aligned with their unique operational and security requirements.
If you're interested in using North to accelerate your business goals, reach out for a demo.
Original source - Jul 31, 2025
- Date parsed from source:Jul 31, 2025
- First seen by Releasebot:Jun 1, 2026
Introducing Command A Vision: Multimodal AI built for business
Cohere introduces Command A Vision, a state-of-the-art enterprise vision model that delivers strong multimodal understanding, OCR, and scene analysis while keeping a low compute footprint. It supports secure private deployments, RAG with citations, multilingual use, and JSON output for document automation.
Command A Vision excels across enterprise image understanding tasks while keeping a low compute footprint.
Today, we're introducing Command A Vision, a new state-of-the-art generative model that brings enterprises leading performance across multimodal vision tasks while maintaining strong text capabilities. Command A Vision lets agents see inside the enterprise, unlocking the automation of tedious tasks that use visual data like slides, diagrams, PDFs, and photos. Whether it's interpreting product manuals or analyzing real-world scenes for risk detection, the model excels at tackling the most demanding enterprise vision challenges.
It surpasses other models in its class including GPT 4.1, Llama 4 Maverick, Mistral Medium 3 (and Pixtral Large) on key multimodal benchmarks. Command A Vision prioritizes enterprise needs with highly secure, efficient, and flexible deployment options. Its low serving footprint enables seamless on-premise or private deployments with two or fewer GPUs, ensuring enterprise-ready scalability.
Strong performance across enterprise vision tasks
Chart, graphs, diagrams analysis
Command A Vision excels at understanding and analyzing a wide range of visual and multilingual data, including charts, graphs, tables, and diagrams. The model accurately extracts data from diverse visual formats, applies domain-specific knowledge across industries such as finance, healthcare, manufacturing, construction, and energy, and performs complex analysis based on the extracted information.
Document OCR and visual processing
Command A Vision stands out in document OCR and visual processing, accurately extracting text and information from various document types, including scanned documents, invoices, and forms. The model goes beyond simple text recognition, understanding document layout and structure to extract meaningful data. This capability, combined with our structured data output support for JSON mode, allows Command A Vision to automate repetitive document processing tasks, improve data accuracy, streamline workflows, and seamlessly integrate with existing systems, making it an invaluable tool for enterprises processing large volumes of documents. It achieves top-tier performance across the DocVQA, TextVQA, and OCRBench benchmarks.
Real-world scene understanding
Command A Vision's capabilities extend to real-world scene understanding, enabling it to analyze and interpret complex visual environments. This goes beyond simple object detection, as the model can understand spatial relationships, context, and even subtle nuances within images and photographs, making it ideal for a wide range of real-world applications, including risk detection in industrial settings and retail analytics.
Capabilities and efficiency suited for enterprise scale
Command A Vision was built to serve enterprises across the capabilities that matter most to them. It combines the other important text features of Command A, like advanced retrieval-augmented generation (RAG) with citations and multilingual performance across several key business languages.
With Command A Vision, enterprises can quickly and securely access context-aware insights and analysis on their own data, whether in text or various types of enterprise image formats – on the hardware they have and in the languages they need. Our Command model series is optimized for enterprise needs at the forefront, excelling on complex business applications while balancing performance, accuracy, and efficiency.
With low hardware requirements, enterprises in regulated industries that need private deployments can efficiently use Command A Vision in production. Command A Vision can be deployed privately with just two or fewer GPUs. It only requires two A100s, or one H100 for 4-bit quantization.
What customers are saying
“We’re incredibly excited about the release of Command A Vision. These models dramatically expand the boundaries of what’s possible with generative AI, enabling us to move beyond text and into the realm of visual understanding. Already, we’ve seen Command A Vision solve some of our most complex and time-consuming challenges; it not only streamlines workflows but unlocks entirely new opportunities for generative AI. By integrating visual context into our AI systems, we can start building solutions that are grounded in what we can see, not just what we can read. I’m excited to see how far we can push this technology and what we can accomplish with it in our toolkit.” – Jeffrey English, Director, Professional Services, Fujitsu Intelligence
“We’re incredibly excited about the release of Command A Vision. These models dramatically expand the boundaries of what’s possible with generative AI, enabling us to move beyond text and into the realm of visual understanding. Already, we’ve seen Command A Vision solve some of our most complex and time-consuming challenges; it not only streamlines workflows but unlocks entirely new opportunities for generative AI. By integrating visual context into our AI systems, we can start building solutions that are grounded in what we can see, not just what we can read. I’m excited to see how far we can push this technology and what we can accomplish with it in our toolkit.” – Jeffrey English, Director, Professional Services, Fujitsu Intelligence
“During early testing, the Command A Vision model has demonstrated exceptional capabilities in understanding and extracting data from intricate construction industry documents, such as lien waivers, invoices, and drawings.The ability to automate this kind of AI-driven data capture has the power to transform document processing, data accuracy, and project management that could reduce risk, time, and cost for the construction industry.” – Mark Webster, Senior Vice president and General Manager, Oracle Infrastructure Industries
Availability
Command A Vision is available today on the Cohere platform and for research use on Hugging Face. If you are interested in private or on-prem deployments, please contact our sales team for bespoke pricing.
We compare ourselves to the best non-reasoning models (non-deprecated) available from other providers. Where available, benchmark scores are taken either from other providers’ own reports or publicly available leaderboards; or otherwise (greyed numbers), use best-effort internal evaluation either through VLMEvalKit or the official codebase.
Original source - Apr 15, 2025
- Date parsed from source:Apr 15, 2025
- First seen by Releasebot:Jun 1, 2026
Introducing Embed 4: Multimodal search for business
Cohere releases Embed 4, a new multimodal embedding model for enterprise search and retrieval that supports long documents, 100+ languages, and noisy real-world data while offering secure VPC or on-prem deployment and compressed embeddings for lower storage costs.
Embed 4 delivers state-of-the-art accuracy and efficiency, helping enterprises securely retrieve their multimodal data to build agentic AI applications.
Key Contributors: Carlos Lassance, David Rau, Elliott Choi, Nils Reimers, Luke Ross, Clifton Poth, Martin Hentschel, Javi Morales, Nabila Abraham, Minghan Li, Daniel Simig, Violet Dang
Today we’re releasing Embed 4: our latest state-of-the-art multimodal embedding model that enables enterprises to add frontier search and retrieval capabilities to AI applications — a necessity for businesses building assistants or agents that need to understand their business context.
Embed 4 offers customers:
- State-of-the-art multimodality: Embed 4 is uniquely capable at accurately and quickly searching multifaceted documents such as intricate PDF reports and dynamic presentation slides — whether the document is text-based or includes images, tables, graphs, code, and diagrams.
- Breakthrough context length: Embed 4 can generate embeddings for documents up to 128K tokens (around 200 pages) in length such as annual financial reports, product manuals, or detailed legal contracts.
- Leading multilingual capabilities: Embed 4 is multilingual across 100+ languages including key business languages such as Arabic, Japanese, Korean, and French to support global enterprises.
- Enhancements for security-minded industries: Embed 4 is optimized with domain-specific understanding of data from regulated industries such as finance, healthcare, and manufacturing. It can be deployed in virtual private cloud (VPC) and on-premise environments to keep data secure.
Existing embedding models fail to natively understand complex multimodal business materials, leading companies to develop cumbersome data pre-processing pipelines that only slightly improve accuracy. Embed 4 solves this problem, allowing enterprises and their employees to efficiently surface insights that are hidden within mountains of unsearchable information.
"Hunt Club's Atlas product lets customers navigate their sprawling professional networks and find talent within them. AI is essential in searching across complex candidate profiles and making sense of messy data to find ideal matches. Cohere's Embed 4 enables us to search these profiles more precisely, showing a +47% relative improvement over the already-strong performance of Embed 3. We are extremely impressed!"
- James Kirk, VP of AI, Hunt Club
Unlocking multimodal and multilingual search for global organizations
Embed 4 enables organizations to search their unstructured documents, where a large majority of their important data resides. It is uniquely capable of generating high-quality representations of complex mixed-modality documents – all within a unified vector. This capability additionally empowers businesses to build applications that can understand reference images alongside text questions, enabling users to use new search patterns to accelerate their productivity.
In particular, Embed 4 excels in regulated industries such as finance, healthcare, and manufacturing. In addition to strong general business knowledge, the model is optimized with domain-specific understanding of these industries so that it can identify relevant insights within common documents such as:
- Finance: investor presentations, annual financial reports, M&A due diligence files
- Healthcare: medical records, procedural charts, clinical trial reports
- Manufacturing: product specification documents, repair guides, supply chain plans
Each industry category represents a blend of public and proprietary benchmarks (see more details here). Languages range from English only, monolingual multilingual, and cross-lingual multilingual. Task types ranged from text-only and text-to-PDF datasets. All dataset performance metrics are measured by NDCG@10. ColQwen is a multi-vector model. For embedding models that do not offer native image understanding, all mixed-modality datasets (i.e. PDFs / PPTs) were parsed with a multimodal generative model before being embedded.
Language should never be a barrier to accessing information. Embed 4 delivers leading multilingual understanding across 100+ languages such as Arabic, French, Japanese, and Korean. It also is capable of searching across languages, ensuring employees can find critical data regardless of the language it's stored in or the languages they speak.
Each language category represents a blend of public and proprietary benchmarks (see more details here). Tasks ranged from monolingual to cross-lingual (i.e. english as the query language and the respective monolingual non-english language as the corpus). All dataset performance metrics are measured by NDCG@10.
Business data tends to be imperfect. Certain documents have spelling mistakes, formatting issues, or have pages with landscape orientation that are meant to be in portrait. To ensure these issues don’t harm the accuracy of search results, Embed 4 was trained to be robust against noisy real-world data. It is also performant at searching over scanned documents and handwriting. These formats are common in legal paperwork, insurance invoices, and expense receipts. This capability eliminates the need for complex data preparations or pre-processing pipelines, saving businesses time and operational costs.
“Agora is an AI search engine that makes it easy to shop across 35,000 online stores in one place. We are blown away by Embed 4’s ability to accurately surface relevant products to search queries. E-commerce data is complex, containing images and multifaceted text descriptions. Being able to represent our products in a unified embedding makes our search faster and our internal tooling more efficient."
- Param Jaggi, Founder, Agora
Crucial foundation for agentic enterprise AI applications
AI systems must understand the context in which they operate to be useful. AI assistants deployed within businesses do this through a process called Retrieval-Augmented Generation (RAG). In essence, the generative AI model (i.e. Command A) that powers the conversational experience will rely on a search engine – that is connected to proprietary company information – to source relevant information to user questions before responding. This improves the usefulness of answers and mitigates against hallucinations.
Embed 4 is the optimal search engine for enterprise AI assistants and agents. In addition to strong accuracy across data types, the model delivers enterprise-grade efficiency. This allows it to scale to meet the demands of large organizations. Further, because high data storage costs lead to reduced ROI on technology investments, we designed Embed 4 to output compressed embeddings. This helps organizations to save up to 83% on storage costs while maintaining search accuracy.
Compression can occur on the format precision of the vectors (binary, int8, and fp32) and the dimension of the vectors. All dataset performance metrics are measured by NDCG@10.
We are excited for businesses to use Embed 4 as the foundation of their search and retrieval pipelines, powering the next generation of AI applications across industries. Embed 4 also seamlessly integrates with North, our secure AI agents platform, by powering the semantic search capabilities of the end-to-end search system found in Compass. Our vertically integrated technology stack enables businesses to seamlessly integrate their data across workplace tools, build custom AI agents to suit their unique needs, and maintain control over their data behind the secure firewalls of their private environment.
Embed 4 is available today
Embed 4 is available today on Cohere’s platform and Microsoft Azure AI Foundry.
"We’re expanding our partnership with Cohere by bringing two of their latest enterprise models— Embed 4 and Command A—to Azure AI Foundry. These state-of-the-art models enable powerful, efficient, and secure AI solutions. Critically, we are excited to see how these models enhance agentic capabilities: grounding responses in richly contextualized data—core to building reliable, observable AI agents that can act autonomously and deliver enterprise-grade performance."
- Asha Sharma, CVP, Product , AI Platform
Embed 4 is also available on Amazon SageMaker and for private deployment into any VPC or on-premise environment. To learn more, contact our sales team and find more technical details in our developer documentation.
Original source - Mar 13, 2025
- Date parsed from source:Mar 13, 2025
- First seen by Releasebot:Jun 1, 2026
Introducing Command A: Max performance, minimal compute
Cohere introduces Command A, a new enterprise generative model built for fast, secure AI with strong agentic, multilingual, coding and RAG performance. It offers 256k context, runs efficiently on just two GPUs, and is available today on the Cohere platform with research access on Hugging Face.
Scalable efficiency
Command A is on par or better than GPT-4o and DeepSeek-V3 across agentic enterprise tasks, with significantly greater efficiency.
Today, we’re introducing Command A, a new state-of-the-art generative model optimized for demanding enterprises that require fast, secure, and high-quality AI. Command A delivers maximum performance with minimal hardware costs when compared to leading proprietary and open-weights models, such as GPT-4o and DeepSeek-V3. For private deployments, Command A excels on business-critical agentic and multilingual tasks, while being deployable on just two GPUs, compared to other models that typically require as many as 32.
In head-to-head human evaluation across business, STEM, and coding tasks, Command A matches or outperforms its larger and slower competitors – while offering superior throughput and increased efficiency. Human evaluations matter because they test on real-world enterprise data and situations.
Head-to-head human evaluation win rates on enterprise tasks. All examples are blind-annotated by specially trained human annotators, assessing enterprise-focused accuracy, instruction following, and style. Throughput comparisons are between Command A on the Cohere platform, GPT-4o and Deepseek-V3 (TogetherAI) as reported by Artificial Analysis.
Across a range of standard benchmarks Command A provides strong performance on instruction following, SQL, agentic, and tool tasks.
Performance evaluated across academic benchmarks (MMLU, MATH, IFEval), agents benchmarks (BFCL, and Taubench), and coding benchmarks (MBPPPlus, SQL, and RepoQA). Methodology and further details are provided at the bottom in a footnote [1].
We focused on building Command A as efficiently as possible, while also making it as efficient to serve in production as possible. With a serving footprint of just two A100s or H100s, it requires far less compute than other comparable models on the market. This is especially important for private deployments.
Impractically large models lead to poor latency. When you just want correct answers quickly, Command A is the best choice. In fact, Command A can deliver tokens at a rate of up to 156 tokens/sec which is 1.75x higher than GPT-4o and 2.4x higher than DeepSeek-V3. Private deployments of Command A can be up to 50% cheaper than API-based access.
Command A tokens per second and time to first token is superior to GPT-4o and DeepSeek-V3 for both long and short context requests.
Enterprise-ready capabilities
We designed Command A with business needs in mind. Its 256k context length (2x most leading models) can handle much longer enterprise documents. Other key features include Cohere’s advanced retrieval-augmented generation (RAG) with verifiable citations, agentic tool use, enterprise-grade security, and strong multilingual performance.
Head-to-head human evaluation win rates comparing Command A and GPT-4o on enterprise RAG use-cases. All examples are at least 3-way blind-annotated by specially trained human annotators, assessing fluency, faithfulness, and response utility.
We understand that global companies need capabilities across regions. Command A offers expanded enterprise-level support for the 23 languages spoken by the majority of the world's population. We performed an extensive human evaluation and found users strongly preferred Command A over DeepSeek-V3 across most languages on a range of business use cases.
Head-to-head human evaluation win rates on enterprise tasks across 8 languages. All examples are blind-annotated by specially trained human annotators, assessing enterprise-focused accuracy, instruction following, and style.
In particular, Command A is much better than GPT-4o or DeepSeek-V3 at consistently answering with content in the requested language, for example answering in the relevant Arabic dialect of the user.
Arabic cross-lingual line-level pass-rate (LPR) on the prompts from Marchisio et al., 2024 and average ADI2 score over monolingual prompts in 4 Arabic dialects (Egyptian, Saudi, Syrian, Moroccan) from Robinson et al., 2024.
Powering AI agents at scale
AI is only as good as the data you give it. With that in mind, Command A securely delivers accurate responses to questions based on your internal company information. In practice, customers use this for tasks such as sourcing relevant HR policies by office location, reviewing legal regulations, and analyzing long financial reports.
The next generation of Cohere models will help power a range of AI applications for customers across industries like finance, healthcare, manufacturing, energy, and the public sector. In particular, they will seamlessly integrate with North, our secure AI agents platform to unlock the full potential of your company data and people with AI agents. Our fully integrated technology stack enables full customization of the product for customers to suit their unique business needs.
North securely leverages enterprise tools like CRM and ERP software, as well as connects to internal company databases and external web search services. This enables you to build agents that take action for you behind the secured firewalls of your enterprise systems.
Availability
Command A is available today on the Cohere platform, for research use on Hugging Face, and coming soon to major cloud providers. If you are interested in private or on-prem deployments please contact our sales team.
Cohere API Pricing
Input Tokens
Output Tokens
Command A
$2.50 / 1M
$10.00 / 1M
[1] BFCL: Performance on the BFCL-v3 benchmark on March 12, 2025. Where available, scores are taken from the public leaderboard, and otherwise using a best-effort internal evaluation using the official codebase. For competitors, we report the higher of their BFCL ‘prompted’ or ‘function-calling’ score. We report the Overall score which tests tool-use in diverse, real-world environments.
Original source
Taubench: Performance on the Taubench benchmark. Where available, scores are taken from the public repository leaderboard, and otherwise use a best-effort internal evaluation using the official codebase. We report the pass@1 scores on the Retail and Airline tasks which evaluate tool-use agents in multi-turn customer support use cases.
Academic: Performance across academic benchmarks that span general knowledge (MMLU), math performance (MATH), and instruction following (IFEval). We find that Command-A performs approximately at the level of, or exceeds the performance of, GPT-4o and DeepSeek-V3.
Coding: We note that Command-A demonstrates particularly strong performance on SQL benchmarks (average of BirdBench, Spider Dev, and Spider Test), and at the level of GPT-4o across use cases for MBPPlus (Python programing). Finally, we note its superior performance on repository-level question-answering in longer contexts (RepoQA). - Feb 27, 2025
- Date parsed from source:Feb 27, 2025
- First seen by Releasebot:Jun 1, 2026
Introducing Command R7B Arabic
Cohere releases Command R7B Arabic, a lightweight multilingual AI model tuned for advanced Arabic language support in the MENA region. It brings strong enterprise performance, long context, efficient deployment, and open weights, with availability on Cohere, HuggingFace, and Ollama.
Our state-of-the-art lightweight multilingual AI model has been optimized for advanced Arabic language capabilities to support enterprises in the MENA region.
Today, Cohere is releasing a new state-of-the-art version of our lightweight Command R7B model that excels in advanced Arabic language capabilities for enterprises in the Middle East and Northern Africa. It is mission-critical for businesses to have secure AI technology that supports their global teams across a range of languages, dialects, and cultures. This release marks another step in Cohere’s commitment to provide measurable impact to customers through top-tier security, customization, and multilingual support.
Command R7B Arabic is a fast and highly efficient model that can be served on low-end GPUs, a MacBook, or even CPUs. Similar to other models in the R series, it offers a context length of 128k and industry-leading performance in its class across capabilities that matter most to businesses like regional language understanding and strong accuracy with citations using retrieval-augmented generation (RAG). Its compact size enables businesses to more easily scale Arabic language AI applications to production.
We have always prioritized ensuring our AI technology serves as many people, organizations, and markets as possible. This open-weights release creates new possibilities for Arabic-speaking developers and businesses. We will continue to deliver secure AI solutions designed to meet the specific needs of our global enterprise customers.
Leading enterprise performance and efficiency
Multilingual
Command R7B Arabic outperforms other leading models in its class across key enterprise tasks that rely on advanced Arabic language and culture understanding. While Command R7B is already a strong multilingual model, the R7B Arabic model offers improvements in all Arabic language dimensions. We achieved enhanced Arabic performance with no regression in the core languages Command R7B already supports.
Its long-context length allows it to process and generate text with high accuracy and coherence. The model is particularly well-suited for advanced business applications like RAG and building agents that require complex reasoning, multiple actions, and accessing internal information sources. It excels in instruction-following and length-control functionality so users can perform real-world tasks with AI in their native language. This includes document summarization, question answering on company materials, and leveraging external tools (search engines, APIs, and vector databases) to automate repetitive work.
Evaluations on capabilities that are relevant to enterprise tasks. Arabic language and cultural understanding (AlGhafa-Native + Arabic MMLU), Instruction Following (IFEval Arabic), and RAG (TyDI QA Arabic + FaithEval Arabic - an independently translated version of the well-known RAG benchmark FaithEval).
Evaluations on enterprise usability factors. Auto win-rates on Arabic version of LMSYS Arena "Hard" human preference tasks (details can be found in our Aya Expanse release).
Efficiency
Similar to Command R7B, R7B Arabic is designed for businesses that need to optimize for speed, cost-performance, and compute resources while maintaining high accuracy on enterprise tasks. It's one of the most efficient models in the market for scalable and practical AI applications. The model can be run on a single GPU, on-prem, and further fine-tuned to enhance performance.
Customization
We developed Command R7B Arabic to tackle a core limitation in the market with general purpose models. Businesses are looking for AI solutions that serve their specific needs. We focused on addressing the unique challenges of Arabic language processing such as managing complex morphology and ensuring accurate dialectal variations. With this release we are providing a customized solution for the Arabic language that will securely enable organizations in the region to accelerate adoption of AI.
We will continue to partner closely with enterprises across industries and geographies to provide seamless integration, expanded capabilities, and tailored AI solutions to boost productivity and efficiency.
Availability
Command R7B Arabic is available today on the Cohere platform as well as accessible on HuggingFace and Ollama. As we’ve done with the rest of the R series, we’re releasing the model weights to provide access to state-of-the-art AI technology for the research community.
If you are interested in on-prem deployment please reach out to our sales team.
—
اليوم، Cohere تصدر نسخة جديدة متطورة من نموذجنا Command R7B الأحدث الصغير الحجم وفائق السرعة و يتميز بقدرات متقدمة للغة العربية للمؤسسات في الشرق الأوسط وشمال أفريقيا. إنه من الأهمية القصوى لبلوغ أهداف الشركات أن تحظى بتقنيات ذكاء اصطناعي آمنة لدعم فرقها العالمية عبرعدة لغات ولهجات وثقافات. هذا الإصدار يمثل خطوة أخرى في تعهد Cohere لتقديم تأثير ملموس للعملاء من خلال أعلى درجات الأمان والتخصيص و دعم اللغات المتعددة.
يعد Command R7B Arabic نموذجاً سريعاً و ذا كفاءة عالية و يمكنه تقديم وحدات معالجة الرسوميات (GPU) منخفضة الأداء أو أجهزة ماك بوك أو حتى وحدات المعالجة المركزية (CPU). على شاكلة النماذج الأخرى في سلسلة R، فإنه يقدم طول سياق يبلغ 128K و أداءً رائدًا في الصناعة في فئته عبر القدرات الأكثر أهمية بالنسبة للشركات مثل فهم اللغة الإقليمية والدقة القوية مع الرجوع إلى المصادر باستخدام التوليد المعزز بالاسترجاع (RAG). يمكن لحجمه الصغير أن يمكّن الشركات من توسيع تطبيقات الذكاء الاصطناعي باللغة العربية بسهولة أكبر للإنتاج.
لقد وضعنا دائمًا على رأس أولوياتنا ضمان خدمة تقنية الذكاء الاصطناعي لأكبر عدد ممكن من الأشخاص والمؤسسات والأسواق. و يخلق هذا الإصدار المفتوح المصدر إمكانيات جديدة للمطورين والشركات التي تتحدث اللغة العربية. و سنواصل تقديم حلول الذكاء الاصطناعي الآمنة المصممة لتلبية الاحتياجات المحددة لعملائنا من المؤسسات العالمية.
أداء وكفاءة مؤسسية رائدة
متعدد اللغات
يتفوق Command R7B Arabic على النماذج الرائدة الأخرى في فئته عبر المهام المؤسسية الرئيسية التي تعتمد على الفهم المتقدم للغة والثقافة العربية. على الرغم من أن Command R7B يعد نموذجًا قويًا متعدد اللغات، فإن نموذج R7B Arabic يقدم تحسينات في كل أبعاد اللغة العربية. لقد حققنا أداءً محسنًا للغة العربية دون أي تراجع في اللغات الأساسية التي يدعمها Command R7B مسبقًا.
يتيح طوله الكبير في سياق النص معالجة وتوليد النص بدقة عالية واتساق. يعد النموذج مناسبًا بشكل خاص للتطبيقات التجارية المتقدمة مثل RAG و بناء agents الذين يحتاجون إلى استدلال معقد، وتنفيذ إجراءات متعددة، والوصول إلى مصادر المعلومات الداخلية. يتميز النموذج بقدرات فائقة في اتباع التعليمات والتحكم في طول النص، مما يسمح للمستخدمين أداء المهام الواقعية باستخدام الذكاء الاصطناعي بلغتهم الأم. يشمل ذلك تلخيص المستندات، والإجابة على الأسئلة المتعلقة بمواد الشركة، والاستفادة من الأدوات الخارجية (مثل محركات البحث و APIs وقواعد البيانات المتجهات) لأتمتة الأعمال المتكررة
تقييمات القدرات المتعلقة بالمهام المؤسسية: اللغة العربية والفهم الثقافي (AlGhafa-Native + Arabic MMLU). اتباع التعليمات (IFEval Arabic). التوليد المعزز بالاسترجاع (FaithEval Arabic + TyDI QA Arabic - نسخة مترجمة من FaithEval معيار RAG الشهير).
تقييمات عوامل قابلية الاستخدام المؤسسي. معدلات الفوز التلقائي في النسخة العربية لساحة LMSYS مهام التفضيل البشري "الصعبة" (يمكن إيجاد التفاصيل في إصدار AYA Expanse الخاص بنا)
الكفاءة
على غرار Command R7B تم تصميم R7B Arabic للأعمال والشركات التي تحتاج إلى تحسين من أجل السرعة، وتكلفة الأداء وموارد الحوسبة مع الحفاظ على الدقة الفائقة في أداء المهام المؤسسية. إنه أحد أكثر النماذج كفاءة في السوق لتطبيقات الذكاء الاصطناعي العملية والقابلة للتطوير. يمكن تشغيل النموذج على وحدة معالجة رسومات (GPU)، وعلى الأجهزة المحلية في مكان العمل، ويمكن ضبطه بشكل أكبر لتحسين الأداء.
التخصيص
قمنا بتطوير Command R7B Arabic لمعالجة قيود جوهرية في السوق من خلال نماذج الأغراض العامة. تبحث الشركات عن حلول الذكاء الاصطناعي التي تخدم احتياجاتها الخاصة. لقد قمنا بالتركيز على معالجة التحديات الفريدة لمعالجة اللغة العربية مثل إدارة الصرف المعقد وضمان التنوعات الدقيقة في اللهجات. من خلال هذا الإصدار، نقدم حلًا مخصصًا للغة العربية من شأنه أن يمكن المؤسسات في المنطقة من تسريع عملية تبني الذكاء الاصطناعي بشكل آمن.
سنواصل الشراكة عن كثب مع المؤسسات عبر الصناعات والمناطق الجغرافية المختلفة لتوفير تكامل سلس، وقدرات موسعة، وحلول الذكاء الاصطناعي المصممة خصيصًا لتعزيز الإنتاجية والكفاءة.
الإتاحة
يتواجد Command R7B Arabic اليوم على منصة Cohere platform وكذلك يمكن الوصول إليه على HuggingFace و Ollama. وكما فعلنا مع بقية سلسلة R، فإننا نصدر أوزان النموذج لتسهيل الوصول إلى أحدث تقنيات الذكاء الاصطناعي لمجتمع البحث.
إذا كنت مهتمًا بالإطلاق عبر الأجهزة المحلية للمؤسسة، فيرجى التواصل مع فريق المبيعات لدينا.
Original source - Jan 9, 2025
- Date parsed from source:Jan 9, 2025
- First seen by Releasebot:Jun 1, 2026
Introducing North: A secure AI workspace to get more done
Cohere launches early access for North, a secure AI workspace that combines LLMs, search, agents, and automation to help teams work faster. Built for private and air-gapped environments, it focuses on productivity, privacy, and easy integration into everyday workflows.
North combines LLMs, search, and automation into one secure AI workspace. It outperforms Microsoft Copilot and Google Vertex AI Agent Builder, seamlessly boosting workforce productivity and operational efficiency.
Today, we’re launching the early access program for North, our all-in-one secure AI workspace platform that empowers employees to significantly improve the quality and speed of their work. North helps employees across teams and industries offload routine tasks and focus their time on where they can add the most value to their company. North moves beyond isolated foundation models – combining LLMs, search, and agents into an intuitive platform that effortlessly integrates AI into your daily work.
North is designed to meet the strictest security and privacy standards because we understand that this is mission-critical to companies. The platform is optimized to run in private–including air-gapped–environments so that organizations can safely integrate all their sensitive data in one place. This ensures enterprises can unlock the productivity and efficiency benefits of AI without having to build the technology themselves or risk your data in a complex web of third party service providers.
We've designed North to put the user in control, stay out of the way, and enable employees to do their best work. By developing a vertically integrated technology stack that includes both the foundation models and a user-friendly platform, we’re able to create an unrivaled product experience that reduces the barriers to adoption. Ultimately, North is a partner to get more done.
How North guides your workforce toward peak productivity
AI that people will want to use
North equips organizations with the tools to realize the full potential of their data and their people. The platform enables users to instantly customize and deploy AI agents that can help them find relevant information across global organizations in multiple languages, conduct research & analysis, and perform complex tasks spanning various lines of business and previously disconnected tools.
Any employee, regardless of their technical background, can effortlessly create, customize, and share an AI agent with just a few clicks. This includes agents for core business functions like HR, finance, customer support, and IT that allow teams to execute faster and achieve more.
Accuracy across Finance, HR, Customer Support and IT benchmarks comparing Microsoft Copilot, Google Vertex AI Agent Builder, and Cohere North. We measure the percentage of completions from each model that received a Llama Index (LI) score of at least ‘relevant and correct’. LI is a common industry metric that assigns a score to a completion based on its comparison with a ground truth answer. This graph shows the LI performance of competitors relative to North, indicating North outperforms competitors on all benchmarks.
Seamless integration
North provides a trusted platform that seamlessly integrates into existing workflows right out-of-the-box. AI agents created with North can quickly and easily connect to the workplace tools and applications that employees regularly use. This ability to integrate with any tool a business cares about (including in-house applications) enables North to automate large swaths of tedious work leveraging internal data behind your secured firewalls.
The current build-it-yourself approach to AI deployment places a huge burden on organizations to invest the time, expertise, and resources into developing bespoke solutions and then maintaining them. North helps avoid these pain points and reduces the time to value for customers. This means faster workforce adoption, allowing your teams to experience the productivity benefits of AI sooner.
Accuracy of Microsoft Copilot, Google Vertex AI Agent Builder and Cohere North on a proprietary evaluation benchmark based on Cohere internal documents and real life employee prompts. Left: Percentage of completions from each model that received a Llama Index score of at least ‘relevant and correct’. Right: Percentage of completions from each model that received a Human Evaluation score of ‘Perfect’. Auto evals may over-estimate competitor performance compared to blind human user evals.
Cutting-edge search system
Among its capabilities, North leverages our most advanced multimodal AI search and discovery system, Compass. This system addresses a core challenge with today’s business data being scattered across various applications and teams. As well as, data that exists in different modalities, formats, and languages. Fragmented data can hinder the ability of organizations to make informed decisions and stay competitive.
Seamlessly built into the backbone of North, Compass accurately indexes, stores, and enables quickly searching complex enterprise data. This includes extracting information from multimodal data sources like images, slides, spreadsheets, and documents, enabling employees to quickly surface actionable insights from across their business.
A fine-grained security system ensures the right access controls, enabling employees to use AI on their internal data, while ensuring that no sensitive data is leaked.
Two graphs showing how Cohere North reduces the average time it takes to complete a task requiring company knowledge by 5.45x while maintaining the same level of response quality. North response time was measured by having human annotators complete tasks using the North interface connected to a Google Drive, and Manual Search was measured by having human annotators complete their tasks using a search bar, cmd F and advanced search over documents in a Google Drive. Response quality was measured by the percentage of annotator-written answers that received a LI score of at least ‘relevant and correct'.
Industry-specific customization
Since we developed each part of the technology stack underpinning North, the platform can be tailored to suit the unique needs of any business. This granular level of control is essential for customizing AI solutions to match each organization's needs such as industry-specific terminology and internal knowledge. Additionally, with our industry-leading focus on privacy and security, North is well suited for regulated industries where companies simply cannot risk their proprietary data.
We’ve already started engaging with a limited number of businesses across sectors like finance, healthcare, manufacturing, and critical infrastructure to deploy North on an early access basis. Initially, we’re excited to be partnering with Royal Bank of Canada (RBC), a leading global financial institution to explore the transformative potential of AI in banking. Together we are co-developing North for Banking, a tailored AI solution designed to enhance workforce productivity while meeting the particular security and privacy requirements of the finance industry.
Enterprise-grade security and privacy
North continues our long-standing commitment to building products with interoperability, security, and data privacy at their core.
All companies deal with sensitive data like emails, financial reports, and personal information where safeguards are paramount. North can be deployed securely in a company’s private cloud environment or on-premises, offering customers maximum control, security, and flexibility. Businesses can leverage North with confidence, knowing your data is never accessible outside your company.
Early access program
If you’re interested in learning about the early access program for North please reach out to our sales team for more information.
Note: Vertex AI Agent Builder has a limitation for PDF documents. We converted PDF documents to text files to adjust for this issue. Without this correction Vertex would have had a relative performance of 7.1% for auto evaluations and 12.83% for human evaluations.
Original source - Dec 13, 2024
- Date parsed from source:Dec 13, 2024
- First seen by Releasebot:Jun 1, 2026
Introducing Command R7B: Fast and efficient generative AI
Cohere releases Command R7B, a compact enterprise LLM that brings fast, efficient AI performance to commodity GPUs, Macs, and CPUs. It adds 128k context, multilingual support, citation-backed RAG, tool use, and agentic capabilities, and is available on the Cohere Platform and HuggingFace.
The smallest model in our R series delivers top-tier speed, efficiency, and quality to build powerful AI applications on commodity GPUs and edge devices.
Today, we’re excited to release Command R7B, the smallest, fastest, and final model in our R series of enterprise-focused large language models (LLMs). Command R7B provides state-of-the-art performance in its class of open-weights models across real-world tasks that matter for users. The model is designed for developers and businesses that need to optimize for the speed, cost-performance, and compute resources of their use cases.
Like our other models in the R series, Command R7B offers a context length of 128k and excels in capabilities important for a wide range of business applications. It delivers a powerful combination of multilingual support, citation verified retrieval-augmented generation (RAG), reasoning, tool use, and agentic behavior. Thanks to its compact size and efficiency it can be served on low-end GPUs, a MacBook, or even CPUs – drastically lowering the cost of deploying AI applications into production.
High performance in a small package
A well-rounded model
Command R7B excels on standardized and externally verifiable benchmarks such as the HuggingFace Open LLM Leaderboard. Compared to other similarly sized open-weights models, Command R7B ranks first on average with strong performance across all tasks.
Enhanced efficiency in math, code, and reasoning tasks
A major area of focus for Command R7B has been improving performance on math and reasoning, code, and multilingual tasks. In particular, the model matches or exceeds leading open-weights models in its class across common math and code benchmarks while using fewer parameters.
Best-in-class RAG, tool use, and agents
Command R7B outperforms the other similarly sized open-weights models when it comes to core business use cases such as RAG, tool use, and AI agents. It is an ideal choice for enterprises looking for a cost-efficient model grounded in their internal documents and data. Like our other R series models, our RAG offering delivers native in-line citations that significantly reduce hallucinations and make fact-checking easier.
For tool use, we see stronger overall performance than models of similar size on the industry-standard Berkeley Function-Calling Leaderboard. This shows Command R7B is particularly effective at tool use in real-world, diverse, and dynamic environments and avoids calling tools unnecessarily which is an important aspect of tool use in practical applications . Command R7B’s multi-step tool use capabilities allow it to power fast and capable AI agents.
Optimized for enterprise use cases
Our models are optimized for the capabilities enterprises need for real-world deployment of AI systems. The R series delivers an unmatched balance of efficiency and strong performance. This means ensuring they excel on human evaluation, the gold standard for quality assessment. Command R7B outperforms similarly sized open-weights models in blind head-to-head evaluations by human raters on RAG use cases our customers care about when building AI assistants for functions like customer service, HR, compliance, and IT support.
Efficient and fast
Command R7B’s compact size offers a reduced serving footprint that is ideal for rapid prototyping and iteration. It excels at high throughput, real-time use cases like chatbots and code assistants. It also unlocks dramatically cheaper deployment infrastructure such as consumer GPUs and CPUs to unlock on-device inference.
We achieve this without compromising on our enterprise-grade security and privacy standards to protect customers' data.
Get started
Command R7B is available today on the Cohere Platform as well as accessible on HuggingFace. We’re excited to be releasing the weights of this model to provide greater access to cutting-edge technology for the AI research community.
Cohere API Pricing
Input Tokens
Output Tokens
Command R7B
$0.0375 / 1M
$0.15 / 1M[1] Conversational RAG: Average performance over the 10-dataset ChatRAGBench benchmark which tests the ability to generate responses in a wide range of settings including conversational tasks, attending over long inputs, analyzing tables and extracting and manipulating numerical information in financial settings. We improve evaluation methodology using a PoLL judge ensemble (Verga et al. 2024) using Haiku, GPT3.5 and Command R, providing higher agreement to human annotators (Fleiss’ kappa=0.74 vs 0.57 for original, calculated over 20k human judgements).
Original source
Tool use: Performance on the BFCL-v3 benchmark on 12 Dec 2024. Where available, scores are taken from the public leaderboard, and otherwise use a best-effort internal evaluation using the official codebase. For competitors, we report the higher of their BFCL ‘prompted’ or ‘function-calling’ score. We report the Overall score, the Live subset score which tests tool-use in real-world, diverse, and dynamic environments, and the Irrelevance subset score, which tests how well models avoid calling tools unnecessarily.
REACT Agent/Multi-step: We assess the abilities of LangChain REACT agents connected to the internet to break down complex questions and formulate and successfully carry out a research plan to answer them using Bamboogle and StrategyQA. Bamboogle is evaluated using a PoLL ensemble, and StrategyQA is judged by assessing whether the model follows a formatting instruction to end its answer with either ‘Yes’ or ‘No’. We use the test sets from Chen et al.( 2023) and Press et al. (2023).
ToolTalk challenges a model to perform complex reasoning and actively seek information from users in order to execute complex user tasks, in settings such as account management, sending emails, and updating calendars.
Tool-talk-hard is evaluated using soft-success-rate using the official ToolTalk repository. ToolTalk requires models to expose a function-calling API, which is not available for Gemma 2 9B.
Curated by the Releasebot team
Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.
Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.
Similar to Cohere with recent updates:
- xAI release notes116 release notes · Latest Jun 22, 2026
- Perplexity release notes26 release notes · Latest Jun 19, 2026
- Cursor release notes107 release notes · Latest Jun 22, 2026
- Anthropic release notes663 release notes · Latest Jun 26, 2026
- OpenAI release notes788 release notes · Latest Jun 25, 2026
- Deepseek release notes18 release notes · Latest Apr 24, 2026