Datadog Release Notes

Name: Datadog
Brand: Datadog

Follow Datadog to add their release notes to your feed!

11 release notes curated from 14 sources by the Releasebot Team. Last updated: Jun 26, 2026

Get this feed:

Jun 9, 2026
Date parsed from source:
Jun 9, 2026

First seen by Releasebot:
Jun 26, 2026
Datadog

DASH 2026: Guide to Datadog’s newest announcements

Datadog showcases a major DASH keynote packed with AI-powered observability and remediation updates, including new Bits AI agents for detection, investigation, fixes, release validation, testing, database optimization, and security. It also expands network, logs, journey monitoring, and agentic analytics capabilities.
Close the ops loop from detection to remediation

Autonomously monitor for impactful degradations with Bits Detection

New services ship faster than monitoring configuration can keep pace, leaving endpoints without alerts, thresholds calibrated against old traffic patterns, and routing that goes stale as teams reorganize. Bits Detection, now available in Preview, uses the context Datadog already has about your services, endpoints, dependencies, and deployment history to create and maintain detection coverage automatically. It focuses coverage on the endpoints most likely to affect users, establishes baselines from historical behavior rather than static thresholds, adapts as services change, and connects detection to autonomous investigation and remediation when issues arise. To get started, sign up for the Preview or read the blog post.

Retain your team’s operational knowledge with Bits Memories

Solving the hardest incidents often depends on context that live telemetry alone cannot provide, such as the failure patterns your team has seen before, the fixes that worked or failed, and the service-specific details that make your environment unique. Bits Memories helps retain useful operational lessons from the work your team is already doing across investigations, runbooks, postmortems, Slack conversations, prior remediations, and more. Bits automatically identifies important details and saves them to memory, so when related issues come up later, Bits can use that context without responders having to rediscover or re-enter it under pressure. To get started, sign up for the Preview.

Automatically resolve issues with Bits Remediation

After Bits completes an investigation and identifies a root cause, Bits Remediation helps resolve the issue. Bits can execute remediation actions across your services and infrastructure by calling APIs, running fully configured remediation scripts (such as kubectl commands to restart Kubernetes deployments), and writing code fixes that teams can open as pull requests with one click. Bits Remediation follows the guardrails that your team defines, so actions are aligned with your environment and risk preferences. This helps teams spend less time translating investigation findings into next steps while keeping responders in control of what gets executed. To learn more, review our documentation or sign up for the Preview.

Detect and remediate issues before they escalate with Bits Infrastructure Operations

As environments grow in size and scale and new workloads are deployed every day, infrastructure teams cannot manually triage and fix issues across the breadth of their infrastructure: hosts, Kubernetes, serverless, and networks. These issues include disk saturation on hosts, CrashLoopBackOff and OOMKilled errors in Kubernetes, concurrency limits on AWS Lambda, expiring TLS certificates on Networks, memory pressure on Amazon ECS, and much more.

Bits Infrastructure Operations, now available in Preview, autonomously detects, investigates, and remediates common and repetitive infrastructure issues before they escalate into incidents. It also flags risky infrastructure changes in pull requests before they reach production. When Bits Infrastructure Operations can safely act within guardrails you define, it remediates issues automatically. When fixes require human approval, it surfaces the highest-priority issues with the full context that your team needs to review and approve the next step. Teams can start with approval-based guardrails and expand guardrails over time as Bits learns from repeated approvals. To learn more, read our blog post or sign up for the Preview.

Ensure reliability

Move from passive observability to proactive network device health and remediation

Network teams are drowning in metrics, events, traffic, and device configuration data, but volume alone doesn’t tell you what’s critical or what to do next. Datadog Network Device Health automatically correlates signals across your network devices and surfaces issues ranked by business impact before they cascade. When an issue is detected, an investigation side panel explains what happened, the blast radius, and the exact config change to roll back. From there, one click deploys the rollback while real-time metrics let you monitor the recovery with confidence. For your most critical incidents, Bits Investigation accelerates troubleshooting with step-by-step reasoning that helps your team pinpoint root cause faster. To get started with Network Device Health, check out our blog post.

Trace config changes causing complex network issues with Network Configuration Management

Datadog Network Configuration Management automatically correlates device performance degradation to the exact configuration change that caused it. When a performance issue strikes, teams no longer need to manually compare configuration snapshots or switch between tools to find the root cause. Datadog tracks configuration changes over time and surfaces AI-generated summaries that translate even the most complex changes into plain language that any engineer can understand and act on. When a problematic change is identified, your team gets a one-click rollback to the last trusted configuration for immediate resolution. Explore the Network Configuration Management documentation to get started.

Trace network issues from application to device with L7 to L1 visibility

A trace showing latency or errors carries the full network story behind it: the services exchanging traffic, the flows connecting them, and the exact hop where a physical device is introducing the problem. Datadog’s L7 to L1 visibility gives engineers end-to-end visibility from the application layer through the network flows between services, down to the physical hops and devices where performance breaks down. Whether the culprit is a misconfigured firewall, an unexpected cross-region route, or a device with heavy packet loss, SREs and network engineers pinpoint the problem directly inside the Network tab in APM. To get started with L7 to L1 visibility in APM traces, read our blog post or sign up for the Preview.

Diagnose internet underlay issues with BGP Centric View

When the internet path degrades and every signal points to the Border Gateway Protocol (BGP) layer, confirming whether a transit provider or a peering issue is the root cause means leaving the platform entirely for manual Autonomous System Number (ASN) lookups and fragmented relationship tracing. Datadog’s BGP Centric View brings that context directly into the Network Path UI in a dedicated BGP tab to surface every ASN in the flow. With a single click, engineers can uncover each ASN’s service provider, upstream neighbors, and downstream neighbors, giving teams the full routing picture without leaving the platform. Explore the BGP Centric View documentation to get started.

Automatically optimize database queries with Datadog Database Monitoring

Datadog Database Monitoring’s Bits Database Optimization gives every engineering team a complete, automated path from slow query detection to production fix without requiring deep database expertise. By validating candidate rewrites against a simulated copy of your schema, Datadog helps ensure each optimization is proven faster on your specific data before it ever reaches your codebase.

When a fix is validated, Bits Database Optimization locates the exact line of code that issued the query and opens a ready-to-merge pull request with benchmark evidence inline, so teams can review and ship improvements within their existing workflow. After the change deploys, teams can confirm that the gain holds under real production load directly in DBM Query Metrics. To learn more, read our blog post.

Query logs across storage destinations with Federated Logs

Modern systems generate massive amounts of telemetry data, and not all of it lands in one place: application and infrastructure logs flow into observability platforms, ML training jobs emit logs into lakehouses, high-volume event streams land in columnar stores, and audit archives go to object storage like Amazon S3. The resulting fragmentation can pose hurdles during investigations, forcing teams to switch contexts and rewrite queries for different syntaxes. Federated Logs lets you query external data stores—including Databricks and ClickHouse—from the Log Explorer, using the same query syntax and facets, no matter where your logs live. Paired with Observability Pipelines, which routes, transforms, and normalizes logs before they reach their destinations, Federated Logs provides a consistent investigation experience across the storage systems you already use. To get started, sign up for the Preview or learn more in our blog post.

Store and search logs at petabyte scale in your own infrastructure with Datadog BYOC Logs

Self-hosted log management gives teams data sovereignty and control, but these solutions are difficult to maintain, and they lack key SaaS platform capabilities like telemetry correlation and AI-powered analysis. Datadog BYOC Logs gives teams the best of both worlds. It runs in your own infrastructure and stays fully integrated with the Datadog platform. Datadog BYOC Logs lets teams keep full control over where their data lives without giving up petabyte-scale search, cross-telemetry correlation, AI-assisted investigation, or centralized governance. Learn more in our blog post.

Ensure intent

Monitor critical user journeys with Datadog Journey Monitoring

Without a unified view, engineering, product, and DevOps teams chase the same problems with different tools and arrive at different conclusions. This makes it nearly impossible to pinpoint whether a drop-off for a critical user journey is due to technical or behavioral factors. Datadog Journey Monitoring brings traffic, conversion rates, uptime, and errors from Real User Monitoring, Synthetic Monitoring, and Product Analytics into a single shared view of every critical user flow, so engineering, product, and DevOps always have a shared understanding of a journey’s performance. Journey Monitoring is currently in Preview, and if your organization is already using all three DEM products (Real User Monitoring, Synthetics, and Product Analytics), you’re eligible to sign up today. Learn more in the Journey Monitoring documentation and read our blog post.

Close the dev loop from finding to fix

Turn Datadog findings into automated code fixes with Bits Code

Engineering teams can get stuck in a reactive remediation loop. Every error spike, performance regression, flaky test, or new vulnerability kicks off the same manual cycle: triage, locate the code, write a fix, run tests, and open a pull request. Bits Code, Datadog’s platform-wide coding agent, closes that loop. It’s embedded wherever Datadog surfaces a problem, from Error Tracking and APM Recommendations to Continuous Profiler, Test Optimization, Code Security, Database Monitoring, Kubernetes Remediation, and Bits AI SRE, so the same agent fixes a recurring error one minute and remediates a vulnerability the next.

Because Bits Code investigates with the same telemetry data that engineers already trust, including logs, traces, metrics, profiles, runtime variables, and security findings, every proposed fix is grounded in real production behavior rather than the guesses generic coding assistants make from source code alone. Teams can also prompt Bits Code directly for refactors and one-off coding work, schedule recurring remediation runs, or trigger runs automatically off telemetry data. Bits Code is now generally available. To learn more, check out our blog post and Bits Code documentation.

Ship code safely at AI speed with Bits Release

Bits Release is an AI release validation agent that verifies every code change from pull request (PR) to production. When a PR is opened, Bits Release analyzes the intended impact of the change, generates a validation plan, runs end-to-end checks in staging, and monitors the production rollout.

Unlike traditional monitoring, Bits Release validates releases in context: It verifies that the expected improvements actually happen while detecting regressions and unintended side effects. When issues occur, it investigates likely root causes and helps generate fixes. Successful validations can be promoted into persistent production monitors, creating a continuous safety loop for high-velocity and AI-generated code. Learn more in our blog post, or sign up for the Preview.

Automate synthetic test coverage with Bits Testing

Keeping synthetic tests current is one of the most time-consuming parts of shipping fast. New flows go untested, interface changes break existing scripts, and coverage gaps quietly follow. Bits Testing Agent automates synthetic test generation and maintenance by exploring your application autonomously, identifying critical user journeys, and generating runnable test suites from a URL or natural language goal. For dynamic applications where interfaces and outputs vary, goal-based tests let you define an intended outcome rather than a fixed sequence of steps, so tests adapt instead of break. Scheduled explorations keep coverage current over time without manual intervention. Learn more in our blog post. To start automating your test coverage with Bits Testing Agent, join the Preview.

The agentic stack data foundation

Get quality answers to business questions with Bits Data Analysis

Bits Data Analysis answers natural-language questions about aspects of your business, such as revenue, sales pipeline, churn, and product adoption. It’s powered by Datadog Data Context, a knowledge base that pulls table descriptions, metric definitions, freshness and quality signals, and lineage from sources like Tableau, Looker, Power BI, Fivetran, your warehouse, and Data Observability. It then enriches that with business context from Product Analytics, upstream applications, and source code, replacing months of manual semantic-layer work. Bits Data Analysis can go further than typical BI tools and explain why a metric changed, such as by tracing a revenue dip to a checkout-service deployment that had a latency spike. The Context Workbench gives data teams a dedicated place to observe how the agent is used across Slack, the Datadog web app, coding agents like Claude Code or Codex, and the Datadog API. From there, admins can define evaluations from real user questions and improve answer quality.

With Bits Data Analysis, data teams get end-to-end governance and observability: pipeline health, data quality, data context, agent answers, confidence indicators, and eval suites that gate any change to the context layer. To learn more about Bits Data Analysis, read our blog post and sign up for the Preview.

Use custom metrics for the modern age with Infinite Cardinality Metrics

Modern systems generate more telemetry data than ever. SRE teams track latency per tenant, region, and feature flag. Engineers building with AI follow signals at every step of an agent’s execution. The dimensions that teams need keep multiplying: tenant, user, device, model, region, execution path. But as telemetry data becomes more granular, cardinality becomes the wall.

Today we’re introducing Infinite Cardinality Metrics, a new pricing option for custom metrics built for the way modern systems operate. Infinite Cardinality Metrics is built for agentic querying and exploration, so you and your agents can ask anything of your metrics. It gives you the freedom to capture every attribute and dimension that matters, no matter how high the cardinality. Infinite Cardinality Metrics is priced per metric name and scales with your data volume, not cardinality, so cost stays predictable as you add context.

Infinite Cardinality Metrics is now generally available. To learn more, visit the documentation, and read our dedicated blog post.

Build and monitor the agentic stack

Monitor agent adoption with Datadog Agent Console

As coding agent usage spreads across teams, engineering leaders need more than anecdotal wins to justify the spend: They need to see who’s using agents, whether it’s improving delivery, and where costs are going to waste. Datadog Agent Console gives you a unified view of activity across coding agents like Claude Code, Cursor, and GitHub Copilot, as well as Datadog’s own Bits AI agents, with adoption analytics, engineering impact metrics, spend attribution, and automated waste detection built in. Agent Console helps you answer three practical questions:

Who in my organization is using coding agents the most?

What are users doing well with agents and where are they struggling?

How does AI spend correlate with engineering output?

You can get started with Agent Console today by visiting our documentation. To learn more about its features, read our blog post.

Understand production LLM behavior with Patterns in Agent Observability

When you deploy an LLM-powered application, production traffic rarely behaves the way you expect: Users ask questions outside the intended scope, goals shift mid-conversation, and workflows emerge that you never anticipated. Patterns in Datadog Agent Observability helps you understand what’s actually happening in production by automatically clustering interactions into behavioral groups, without requiring predefined categories or manual labeling. Each cluster surfaces operational and quality signals, including traffic volume, latency, cost per interaction, error rate, and evaluation scores. This enables you to immediately identify which categories of user behavior are driving regressions or rising costs. For more information, read our dedicated blog post. To request early access, sign up for the Preview.

Improve AI agent quality with Bits Evals

The process of debugging and improving an AI agent follows a consistent pattern: Teams collect user signals, investigate failures in traces, make changes to prompts or workflows, validate those changes with evaluations and experiments, and then monitor the results after deployment. Engineers need to do much of this work manually, with the necessary context—traces, dataset records, and prompt versions—spread across toolsets. Bits Evals is a set of agentic features that handles the repetitive parts of the AI agent development loop, while keeping engineers in control of the decisions that matter. With visibility into the complete context of your agent’s performance, Bits can form a hypothesis and immediately verify it by cross-referencing traces, dataset records, and evaluator outputs as evidence. It can also help you address the issue by suggesting a prompt change, flagging a dataset gap, proposing new evaluator coverage, or surfacing a regression you didn’t know to look for. This removes hours of manual trace-reading, so that engineers can spend their time on decisions rather than gathering the inputs needed to make them. Learn more in our dedicated blog post, or sign up for the Preview.

Secure the agentic stack

Protect agentic AI applications with Datadog AI Guard

AI Guard helps protect custom AI agents against prompt injection, tool misuse, data exfiltration, and other OWASP Top 10 threats. It discovers unprotected agents in your environment, analyzes behavior and historical context, and helps detect and block attacks at runtime. It also provides defense-in-depth for coding agents against malicious skills, scripts, configurations, and packages. AI Guard sits directly inline with your agents to provide real-time security guardrails, so you can deploy AI agents fast without compromising security.

AI Guard is currently in Limited Availability. Sign up to get early access.

Cut vulnerability noise by over 95% with the Datadog Runtime Prioritization Engine

Security teams are drowning in findings, with no reliable way to know which ones pose real risk. The Datadog Runtime Prioritization Engine combines runtime behavior, reachability, service ownership, and business impact into a single prioritization model that surfaces the vulnerabilities tied to your most critical services and routes them directly to the engineering teams that can fix them. One-click remediation and Bits Code can take findings from detection to done without manual triage or chaotic handoffs. To get started, sign up for the Preview.
Original source
Jun 9, 2026
Date parsed from source:
Jun 9, 2026

First seen by Releasebot:
Jun 26, 2026
Datadog

DASH 2026 Harnessing AI: Guide to Datadog’s newest announcements

Datadog introduces a major wave of AI-powered observability and automation, adding Bits Chat, Bits Investigation, MCP integrations, AI Impact, Agent Builder, and agentic onboarding to help teams investigate issues, optimize systems, and measure AI usage faster.

AI is reimagining how engineering teams write code, investigate issues, and operate their systems.

From conversational interfaces and agentic investigations to deep MCP integrations and AI-powered optimization, this year’s DASH releases make it easier to put AI to work across your entire workflow, grounded in the telemetry your team already trusts.

Now, you can query your environment in natural language with Bits Chat, launch autonomous investigations from a failing test, connect your favorite coding agents to live Datadog context, and measure the real impact of AI tools on your delivery. These features and others help teams get more done while keeping engineers in control of the decisions that matter. Read on for everything new in harnessing AI, and check out our other roundup posts for the latest in observability, scale, and security.

Search, analyze, and take action across Datadog faster with Bits Chat

Bits Chat is Datadog’s conversational AI interface that helps teams search, analyze, and take action across their observability data. Available in Datadog, Slack, and mobile, Bits Chat helps users get answers faster without switching tools or rebuilding queries. Use Bits Chat to search across your Datadog environment, generate resources like dashboards and notebooks, investigate and troubleshoot incidents, and more, all through a natural-language interface.

Talk to Bits AI by voice in the Datadog mobile app

Bits AI in the Datadog mobile app now supports voice input. Ask Bits about your system health or an active incident by voice or text, and get answers with context from Datadog public documentation, telemetry data, and service ownership. To get started, open the Datadog mobile app and tap Bits Chat.

Build dashboards via natural-language prompts with Bits Chat

Bits Chat can now generate full Datadog dashboards and individual widgets from a single natural-language prompt, turning a monitoring goal into a ready-to-use visualization in seconds. For example, you can ask Bits Chat to “build a dashboard to monitor checkout latency and error rates for the web-store service,” and it will pick the right metrics, traces, and logs and assemble them into a complete dashboard with appropriate widget types and groupings. You can also iterate conversationally: Highlight a widget to change its query, add a new visualization, or restructure a section, removing the manual work of dashboard authoring so teams can go from question to answer faster.

Create and update investigation notebooks via prompt with Bits Chat

Bits Chat can now generate entire Datadog Notebooks from a single natural-language prompt, turning a question or investigation goal into a structured document with text, visualizations, and live queries in seconds. Ask Bits Chat to “create an investigation for the recent spike of errors in the web-store service,” and it searches relevant telemetry and builds out the notebook with a hypothesis, supporting visualizations, and key findings. You can also modify existing notebooks conversationally: Simply highlight a section and ask Bits to rewrite it, add another SQL query, or generate a playbook from an incident.

Use natural language to write sophisticated queries with Bits Chat in DDSQL Editor

Bits Chat brings natural-language querying to DDSQL Editor: describe what you want in plain English and get SQL back without writing queries from scratch. Use it to write complex queries without memorizing syntax, such as joining containers with CPU metrics to spot overprovisioned workloads, aggregating error logs across services to identify ingestion latency patterns, or querying RUM and Product Analytics to track user engagement trends. You can also ask Bits Chat to explain how an existing query works or optimize a slow one with a single prompt. Because Bits Chat is aware of your available schemas and data sources, it generates queries scoped to the tables you care about.

Analyze slow or failed traces with Bits Chat

When a request is slow or fails, developers often need to inspect a trace span-by-span to understand which service, operation, or dependency contributed to the issue. APM Trace Analysis in Bits Chat helps automate this review for individual traces. From a trace in APM, users can click Fix with Bits to start an analysis. Bits Chat reviews the trace, correlates relevant spans with related logs when available, and surfaces stack traces and error context to explain what went wrong, where it occurred in the request path, and what to investigate next. When Source Code Integration is configured, Bits Chat can also suggest a code-level fix as a follow-up.

Investigate service latency with Bits Chat

Latency investigations often require comparing normal and degraded request behavior, identifying where time is being spent, and understanding which endpoints, dependencies, and tags are most correlated with a slowdown. APM latency investigations in Bits Chat bring this workflow into a guided, natural-language experience directly from APM service and resource views. When users ask Bits Chat to investigate a latency issue, Bits analyzes relevant span data, compares slow traces against healthier request patterns, identifies bottlenecks in the request path, and surfaces the dimensions most associated with the slowdown. This helps engineers move from “something is slow” to a concrete next step without manually pivoting across dashboards.

Investigate cost spikes and budget overages in minutes with the Cloud Cost skill in Bits Chat

Tracking down what’s driving a cost change usually means jumping between dashboards, filtering by team and service, and stitching together context from observability data, resulting in hours of work for a single investigation. The Cloud Cost skill in Bits Chat turns that workflow into a conversation. Ask Bits to investigate a cost anomaly, monitor alert, or budget overage, and it returns a summary with the dollar impact, projected annual cost, owning teams, and rate-versus-usage context. From there, you can drill into top cost drivers, correlate spend with metrics like CPU or request volume, compare actuals against budgets, and capture the full investigation in a Datadog Notebook to hand off to the owning team. The skill works across cloud, SaaS, AI, and Datadog costs, giving FinOps practitioners and engineers a single place to answer ad hoc cost questions.

Investigate and resolve issues with Bits AI

Diagnose frontend issues faster with RUM Agentic Investigations

For frontend engineers, investigating issues usually means pivoting between multiple tools to correlate data from across their stack. Datadog RUM Agentic Investigations help teams identify root causes faster by automatically analyzing data such as RUM events, APM traces, and network logs to produce ranked, evidence-backed findings. Engineers can launch investigations directly from a single session, slow page, or critical journey, then review structured results that stream into the UI in real time. Teams can continue the investigation through a built-in chat interface, save the results to a Notebook, or open the context in Bits Code to generate a code fix.

Get actionable performance insights via profiling and Bits AI

Continuous Profiler delivers code-level visibility into how applications consume CPU, memory, and other resources, but profiling data is often dense and hard for non-experts to navigate, leading to profiling being underutilized by most developers. Datadog now exposes profiling data to AI agents through new MCP tools, Bits Chat, and Bits Investigation, so any engineer can simply ask “What are the main bottlenecks in this service over the last 15 minutes?” Bits automatically finds the right profiling data for the given service and time window, surfaces notable spikes and top CPU consumers across CPU, memory, and wall time, and translates the results into plain-language summaries with recommended next steps. By weaving profiling into the agentic flows that developers already use, this broadens access to one of Datadog’s most advanced datasets and shortens time to remediation during incidents.

Schedule recurring prompts and fixes with Bits Code Automations

Even when teams know exactly which tech debt to fix, the work often stalls behind feature priorities. Bits Code Automations turns that backlog into a continuous workflow by letting Bits Code run on a schedule or off telemetry triggers, instead of waiting for an engineer to start every session manually. Schedule recurring prompts to clear a class of issues at your team’s cadence, like fixing five flaky tests every week or triaging the top new errors every morning. Or configure Bits to start a fix the moment a qualifying telemetry signal appears, using rules you define around services, signals, and severity. Every automation still produces a review-ready pull request, so humans control what merges, and every scheduled or triggered run is tracked from a single view alongside outcomes and PR status. Automations are available today across Error Tracking, Test Optimization, APM Recommendations, Code Security, and custom prompts for general coding tasks, with more Datadog surfaces being added soon.

Triage synthetic test failures faster with Bits Investigation

When a Synthetic browser or API test fails, two questions immediately follow: Is this a real issue, and if so, why? Answering both often means manually sifting through traces, logs, infrastructure metrics, and test history before you can confirm scope or point to a cause. Bits Investigation brings AI-assisted triage into Synthetic Monitoring, automatically classifying failures as likely regressions or test misconfigurations and generating root-cause hypotheses backed by linked evidence from APM traces, infrastructure metrics, and deployment activity. Investigations can be launched on demand or configured to trigger automatically based on monitor criticality.

Visualize alerts and start Bits AI investigations on a live infrastructure diagram

When infrastructure or services break, you need to quickly see what’s impacted and fix it. The new Monitors diagram with Bits Investigation visualizes all of your monitors, so that when you’re paged for an alert you can assess the blast radius by seeing what other alerts are going off on related infrastructure. Then, you can hover on any resource or service on the diagram to have Bits AI start an investigation and track down the root cause. Try it now on any alerting resource Monitor; click on a specific event to see the diagram. Or start an investigation from any resource in the Cloudcraft Monitors diagram.

Bring Bits Investigation into your incident response workflow

When engineers declare an incident, they often have to manually gather context from multiple tools before they can even begin investigating. You can now trigger Bits Investigation directly from an incident Slack channel or Datadog Incident Management, automatically pulling the incident timeline, linked Datadog telemetry data, and any shared context into an active investigation. Bits AI posts real-time findings and a root cause hypothesis to the Slack channel thread, and appears as a named responder on the incident record. Engineers get an AI co-investigator working in parallel from the moment an incident is declared, with no manual setup required.

Investigate governance findings in minutes with Bits Investigations in Governance Console

Governance Console surfaces costly telemetry patterns and stale configurations across a Datadog org, but acting on a finding still means manually piecing together what changed, who owns it, and which control to apply. The Governance Agent with Bits Investigations closes that gap. From a product Insight or a Control, admins launch Bits Investigations seeded with the governance context. Bits returns when the growth starts, the top contributing services and teams, and the root-cause configuration change behind it, then routes the admin to the right control to mitigate. The same Bits Investigations engine that powers production incident investigations is now embedded directly in the governance workflow.

Find AI-generated meeting summaries in the unified incident timeline

Bridge calls are where most incident decisions happen, but capturing what was said has always required someone to take notes. Incident Meeting Summaries automatically post AI-generated summaries of Zoom, Microsoft Teams, and Google Meet bridge calls directly to the incident timeline and Slack channel. Summaries generate at the end of each call and every 10 minutes during an active call, so late joiners catch up without interrupting. Control which incidents get summarized by service, severity, visibility, or tag.

Bring Datadog context into your AI workflows

Bring live Datadog telemetry into your AI agents with native integrations

With Datadog’s connectors and plugins across every major AI agent platform, such as Claude Code, Claude Desktop, Claude Cowork, ChatGPT apps, Codex CLI, and Cursor, developers can access the full power of Datadog’s observability stack directly from within the tools they already work in. By connecting to Datadog, your AI agent can pull recent error logs, visualize a metric spike, summarize an open incident, or inspect a distributed trace, all without leaving your editor, terminal, or chat interface. All web-based agents also include support for MCP Apps to get the same rich visualization experience that developers are accustomed to in Datadog.

Give your AI agents live Datadog access from the command line

AI agents are a standard part of how engineers write, deploy, and troubleshoot software, but most still lack direct access to live production telemetry and rely on long-lived API keys spread across CI pipelines and shell environments. Pup CLI gives shell-style agents OAuth-scoped access to 33+ Datadog product domains through a single binary with 200+ commands, covering Logs, APM, RUM, Cloud SIEM, Incident Management, and more. Agents can retrieve the command schema dynamically via pup agent schema, parse structured JSON or YAML output, and chain results with tools like jq and grep. Bundled skills for incident triage and log-trace correlation install directly into Claude Code and Cursor workflows. Pup CLI pairs with the Datadog MCP Server, which covers chat-style agents in IDEs and assistants.

Bring Datadog telemetry into your AI workflows with MCP Apps

Datadog MCP Server now supports MCP Apps that enable you to visualize Datadog telemetry directly within AI tools such as Claude, Cursor, Codex, and ChatGPT. This expands AI workflows beyond text and tables by adding interactive experiences—including timeseries, pie charts, treemaps, top lists, and more—within supported AI tools. Using natural-language queries such as “Why did checkout latency spike following a recent deployment?” or “How is checkout conversion performing this month?”, your AI tool can retrieve live latency graphs or Product Analytics funnels, enabling you to conduct end-to-end investigations without opening a separate window.

Measure the impact of AI coding tools on your software delivery

Engineering leaders are investing heavily in AI coding assistants but struggle to tie those investments to concrete delivery outcomes. Datadog AI Impact helps close that gap by connecting usage telemetry from AI coding tools like Claude Code, Cursor, and Copilot to your delivery metrics, tagging every commit with the tool and model that assisted it as code flows from pull request to production. See exactly what percentage of your code is AI-assisted, compare AI-assisted and human-written work side by side on velocity and stability, and benchmark tools and models based on your own team’s data (not someone else’s leaderboard) so every adoption and renewal decision is grounded in your delivery data.

Unify multi-cluster Kubernetes visibility with Datadog MCP tools

Investigating Kubernetes issues across multiple clusters requires running the same kubectl commands against each one and manually stitching in the ownership, service, and environment context that kubectl can’t provide. The Datadog MCP Server now includes a Kubernetes toolset that lets MCP-compatible AI agents query resources across your entire cluster fleet in a single call, with results enriched by Datadog metadata. Agents can chain the toolset’s search, describe, and manifest retrieval tools into workflows for incident triage, blast radius mapping, drift detection, governance checks, and PR risk analysis.

Expand APM context for AI agents with APM MCP toolset

The Datadog MCP Server already gives AI agents access to core APM telemetry data through tools like trace lookup and span search. The expanded APM MCP toolset, now in Preview, brings more APM data into the MCP layer, including span tag discovery, APM Recommendations, and deployments from Change Tracking. With this added context, agents can investigate service issues, understand relevant span dimensions, find optimization opportunities, and surface recent deployments that may have contributed to a problem.

Flexibly query your Datadog telemetry data with the DDSQL API and MCP tools

The DDSQL API and MCP tools let you programmatically run DDSQL queries against your Datadog telemetry data using the same Postgres-compatible SQL available in DDSQL Editor. The MCP toolset also gives agents like Claude and ChatGPT the context they need to write DDSQL queries on your behalf, with schema discovery tools that browse available tables and columns, field search across data sources, and DDSQL syntax reference. This unlocks use cases like automated tag governance across your AWS, GCP, and Azure accounts, joining log error rates with span latency to surface degraded services, or analyzing LLM Observability traces to track token usage and model performance across your AI pipelines.

Build agentic workflows for alert response and remediation with Bits Agent Builder

As systems scale, the automated workflows teams build to handle alerts and remediation require increasingly complex, hardcoded logic branches. Bits Agent Builder, now generally available, adds AI-driven orchestration to Datadog Workflow Automation, letting engineers create purpose-built agents that reason through complexity instead of following a fixed script. Engineers describe an agent’s goals in natural language, control which data sources and tools it can access, and deploy agents that interpret Datadog observability data and third-party signals to take action automatically or on demand through chat.

Instrument your app for Datadog without leaving your development environment with Agentic Onboarding

With Agentic Onboarding, Datadog brings instrumentation and setup directly into developers’ existing workflows via either the AI Setup CLI or the Datadog MCP Server. This means that developers can set up observability without needing to leave their environments, dig through documentation, and manually apply complex configurations. The Setup CLI runs in your terminal, detects your stack, and sets up Datadog by instrumenting IaC configurations or application code. The MCP Server brings those same onboarding tools into AI coding assistants, so configuration happens inside the IDE. Teams go from zero to fully instrumented in minutes, without leaving their development environments or needing a Datadog expert.

Give AI agents and developer tools secure, auditable access to infrastructure hosts with Datadog Agent MCP

The Datadog Agent MCP is a new remote-actions toolset that extends the Datadog MCP Server. It gives AI systems and developer CLIs direct, live, secure, and auditable on-demand shell access to your infrastructure hosts, through a backend-proxied channel powered by the Private Action Runner. Using natural language, you can read log files, inspect process state, describe Kubernetes pods and events, and diagnose network issues without SSH access or sending any data from the host. AI agents like Claude Code, OpenAI Codex, and Bits AI can run shell commands and invoke on-demand scripts directly on your hosts. To qualify for this Preview, you should already be running the Datadog Agent (v7.80+) and be able to install the Private Action Runner in your environment.

Reduce costs and improve performance with AI

Centralize your Kubernetes autoscaling deployment and management

Right-sizing Kubernetes workloads fleet-wide is one of the highest-leverage cost optimizations available, but it has historically required per-service expertise that doesn’t scale. Datadog Kubernetes Autoscaling now makes it faster and safer to expand workload autoscaling across your entire cluster, with three rollout paths: bulk activation from the in-app setup page, policy-as-code management with GitOps cluster profiles, and AI-assisted manifest generation. In-place vertical resizing applies right-sizing changes to container resource requests with less disruption than pod recreation.

Surface a broader range of service optimizations with AI Recommendations

APM’s AI Recommendations expands the existing APM Recommendations experience by using AI to surface a broader range of service optimization opportunities, including missing caches, tail latency, resource contention, connection pool exhaustion, excessive serialization, unbounded payloads, and more. Teams can review, triage, and track AI Recommendations through resolution in APM. When Source Code Integration is configured, Datadog can use code context to improve recommendation accuracy and help teams identify where to make a fix.

Eliminate cloud storage waste faster with Datadog Storage Management and Bits Chat

As AI and other data-intensive workloads drive exponential growth in object storage, the most impactful cost patterns are increasingly hidden below the bucket level. Datadog Storage Management’s new recommendations and Bits integration help engineering and FinOps teams find and reduce the biggest cost drivers in their cloud storage. Storage Management automatically surfaces areas of waste or inefficiency, such as small files inflating per-object overhead, duplicate objects, and cold data sitting in expensive tiers. With the Bits Chat integration, you can analyze storage buckets for cost drivers using natural language and generate findings tailored to your data layout, access patterns, and existing configurations. Storage Management for Amazon S3 is generally available today, with Google Cloud Storage and Azure Blob Storage in Preview.

Optimize Spark and Databricks jobs with AI and Datadog Jobs Monitoring

Spark and Databricks jobs can run for hours and cost thousands of dollars a month, but finding the right bottleneck across configuration, query design, code, and infrastructure still takes hours of manual investigation. Datadog Jobs Monitoring surfaces prioritized recommendations across your pipelines with savings estimates tied to real production execution data, and the Datadog MCP Server brings Spark execution context directly into your coding agent so you can investigate and fix jobs without leaving your editor.

Automatically parse and normalize all your logs

Log pipelines transform raw log messages into structured attributes that power search, filtering, dashboards, and monitors across Datadog. While Datadog provides out-of-the-box pipelines for many log sources, custom application logs still require engineers to manually build Grok parsing rules, processors, and remappers. This process requires expertise and ongoing maintenance as log formats change. Auto-Processing reduces that work by automatically detecting unparsed logs at ingest, generating parsing rules, and remapping key attributes like timestamp, status, service, trace ID, and span ID, all without any configuration required. Auto-Processing is also fully managed, so Datadog continuously monitors accuracy and adapts as your log formats evolve, so your team never maintains a Grok rule again.

Generate AI-based Grok parsing rules with one click

DevOps teams often manage high volumes of custom logs that arrive unstructured, improperly formatted, or unparsed. However, writing custom Grok rules to parse that data is hard, prone to syntax errors, and time-consuming. Now, Datadog Observability Pipelines supports AI-assisted Grok parsing so teams can generate parsing rules with one-click in the UI. Paste in your log samples and automatically produce parsing rules to normalize your data into your preferred taxonomy.

Build agent-assisted internal apps with Datadog Apps

AI coding agents make it faster to create internal applications, but those apps still need a reliable way to run, connect to external systems, and fit into the workflows teams use every day. Datadog Apps gives teams a code-first way to build applications from the agents, IDEs, and CI pipelines they already use. Instead of deploying standalone tools that create additional context switching, teams can embed these apps directly into Datadog dashboards, notebooks, service pages, and the Developer Homepage. Apps use Datadog’s identity and permission model and can connect to external systems through configured connections. Datadog also instruments apps to help you monitor health and performance, including errors, user activity, and usage trends.

Gain visibility into AI usage, performance, and spend with Datadog AI integrations

New Datadog integrations with major AI tools and providers across the stack give teams a single place to track AI adoption, measure productivity impact, and control costs. Surface token consumption, model usage patterns, and cost trends across your Anthropic API workloads. Bring OpenAI billing and usage data into Datadog to break down spend by model, project, and time period. Gain visibility into GitHub Copilot seat utilization, suggestion acceptance rates, and active usage across your organization. Pull Microsoft Copilot activity and adoption metrics into Datadog to understand which teams are actively using AI assistance and whether Copilot is delivering measurable productivity gains. Track how your development teams are using Cursor’s AI-powered coding capabilities, including model interactions, usage frequency, and adoption trends. Monitor your Supabase Cloud infrastructure, from database performance and connection pooling to API request volume and auth activity. And connect your existing AI gateway to Datadog LLM Observability to run evaluations with your own API keys and model access.
Original source
All of your release notes in one feed

Join Releasebot and get updates from Datadog and hundreds of other software products.

Create account
Get updates with:
Jun 9, 2026
Date parsed from source:
Jun 9, 2026

First seen by Releasebot:
Jun 26, 2026
Datadog

DASH 2026 End-to-End Observability: Guide to Datadog’s newest announcements

Datadog expands full-stack observability with major DASH updates across OpenTelemetry, APM, RUM, Synthetics, network, serverless, and cloud integrations. It adds faster instrumentation, deeper service and device visibility, and broader support for modern platforms and workflows.

Comprehensive observability starts with quick instrumentation and full visibility into every layer of your stack

This year’s DASH announcements expand Datadog’s coverage from the frontend down to the physical network, with faster instrumentation, deeper digital experience monitoring, broader cloud and infrastructure integrations, and a continued commitment to OpenTelemetry across the platform.

Whether you’re instrumenting Windows hosts in a single step, tracing a request from an end user’s device to a SaaS application, monitoring new clouds like Alibaba, Nebius, and OVHcloud, or running fully vendor-neutral OTel pipelines, these features help you achieve full-stack visibility with less setup and fewer blind spots. Explore everything new in end-to-end observability below, and see our other roundup posts for the latest in AI, scale, and security.

Datadog is the OpenTelemetry-native observability platform

Get OpenTelemetry-native in-app experiences powered by semantic conventions for infrastructure and APM

Datadog now natively resolves OpenTelemetry semantic conventions across its platform. Teams running fully open source, vendor-neutral OTel pipelines get the same curated product experiences—Infrastructure Host List, Kubernetes Explorer, APM service pages, dashboards, and monitors—that previously required Datadog-native instrumentation or Datadog-specific pipeline components. Infrastructure and APM views automatically populate from OTel-native metrics and traces, giving developers and SREs consistent workflows, correlation, and analytics regardless of instrumentation source. To request early access, sign up for the Preview.

OpenTelemetry-native Infrastructure Monitoring

The Infrastructure Host List view now supports OTel natively, enabling customers using Host Metrics Receiver in OTel Collectors to monitor their OTel-based hosts’ health and triage issues. Teams get the same live host inventory, tag-based filtering and grouping, and correlated sidepanel views across metrics, logs, and traces.

Root-cause Kubernetes issues efficiently with OpenTelemetry data

For teams that have standardized on OpenTelemetry, translating telemetry data into vendor-specific formats can lead to fragmented product experiences and misaligned metrics. Native OTel support in the Datadog Kubernetes Explorer automatically translates OTel metrics into Datadog-standard representations, resolving variations in metric units and semantics while preserving their original context. By ingesting your Kubernetes resource manifests and associating them with incoming OTel telemetry data, the Kubernetes Explorer provides a unified, relationship-aware view so you can correlate metrics, logs, and traces with your Kubernetes resources to pinpoint root causes without manual command-line queries. Read our blog post to learn more.

Application Performance Monitoring now natively supports OpenTelemetry RED metrics and semantics

Datadog now natively supports application spans sent using the OpenTelemetry Protocol (OTLP) directly to our platform without the need for a Datadog Exporter in your collection pipeline. OTel-instrumented hosts will now be detected automatically and displayed appropriately in the internal developer portal. Traces from those services now show native OTel semantics without the need for specific OTel namespacing.

Additionally, RED metrics for OTel hosts can now be powered natively using the OTel-standard Span Metrics connector or custom metrics such as HTTP or RPC metrics. OTel data is now considered first-class data, in parity with Datadog semantics and RED metric sources. To request early access to these and other OTel-native features, sign up for the Preview.

Manage DDOT pipeline configurations at scale with Fleet Automation

Platform teams can now remotely configure the Datadog Distribution of OpenTelemetry Collector (DDOT) from Fleet Automation, making it easier to manage telemetry pipelines across large DDOT fleets. Instead of relying on Helm, GitOps, or custom scripts for every collector update, teams can edit YAML, apply configuration changes to selected collectors, and review deployment history directly in Datadog. This capability helps teams standardize OpenTelemetry pipeline operations, reduce configuration drift, and roll out changes for filtering, routing, and sampling with greater control.
Sign up for the Preview to manage DDOT configurations from Fleet Automation.

Accelerate OTel gateway resolutions with Topology View in Fleet Automation

OTel gateway deployments are powerful tools for centralized telemetry management, but their complex configurations and often multi-layered architecture make troubleshooting unexpected telemetry behavior a time-consuming, fragmented process. Topology View in Fleet Automation gives platform teams end-to-end visibility into their gateway architectures, insights on abnormal telemetry data traffic patterns like drops, spikes, and uneven load across the pipeline, as well as the ability to pinpoint root causes with monitor context and component-level pipeline views. To get started, follow our documentation or read our blog post.

Get deeper service visibility with less setup

Comprehensively connect your service data with Service Remapping

Service Remapping is now generally available, giving you direct control over how services are named and grouped throughout your Datadog environment without code or configuration changes. Consistent service names are the glue that holds your Datadog telemetry together, enabling you to correlate traces, logs, and metrics from throughout your distributed architecture. Meanwhile, in complex environments, the same workload often carries different names across different telemetry sources. With Service Remapping, you can easily ensure an accurate picture of your system and unify your telemetry by merging redundant service entries, splitting monolithic entries by tag values, and defining new services based on infrastructure tags to resolve naming inconsistencies across products. Impact previews show which monitors and dashboards are affected before any rule takes effect, so you can make changes with confidence. Read our documentation to get started, or learn more in our blog post.

Instrument your Windows hosts in a single step with Datadog APM

Datadog APM offers host-wide Single Step Instrumentation (SSI) for Windows, available in Preview. With SSI, you can instrument Java applications and .NET applications across an entire Windows host with a single Agent installation command, including all Java applications running on the host and all .NET applications running in IIS. You can also define an instrumentation rule that allows you to instrument .NET applications running outside of IIS. You can also use instrumentation rules for granular control over which Java applications on the host or .NET applications in IIS are instrumented. For new hosts, you can set up APM instrumentation via MSI command, or enable SSI on existing Agents directly from Fleet Automation. To learn more, read our documentation.

Enable end-to-end visibility into your Java and NGINX applications in one command

Achieving full-stack observability has always required two separate instrumentation efforts: one for backend services and one for the frontend. For DevOps and SRE teams that don’t own frontend code, this involves coordinating with another team just to get RUM set up, delaying the unified view of user experience and service performance that your team actually needs. With single-step instrumentation (SSI) for APM and RUM, you can now enable both frontend and backend monitoring in a single command, with no code changes required. Datadog automatically instruments your application, correlates RUM sessions with backend APM traces, and starts surfacing a complete picture of how your services affect real users from the moment the Agent is installed. SSI now supports Java servlet-based app servers (including Tomcat, Jetty, WildFly, and WebLogic) as well as apps served by NGINX, so you get the same single command path to end-to-end visibility. To learn more, read our blog post or documentation.

Trace Azure-managed services end to end in your .NET applications

Distributed .NET applications on Azure rely on managed services like Service Bus, Event Hubs, Cosmos DB, and API Management to route requests between systems. When something goes wrong in production, engineers often lose visibility at the boundary between their application code and Azure-managed infrastructure. Datadog now extends distributed tracing to these services for .NET applications, with no code changes required. Teams can follow requests across the full application flow in a single view, with traces staying connected as messages move through queues and event streams, Cosmos DB operations appearing inline with the rest of the request, and API Management spans linking frontend and backend traces together. Learn more in our blog post.

Bring your Azure Application Insights distributed traces into Datadog APM

Teams using Azure Application Insights for Azure serverless workloads can now get full Datadog APM visibility without Datadog Agent instrumentation. Datadog automatically converts App Insights logs into APM spans and enriches them with Azure resource metadata, so Azure Functions, API Management, Cosmos DB, Azure Blob Storage, and Azure SQL DB spans appear in the same Trace Explorer, flame graphs, and Software Catalog as the rest of your stack. No additional setup is required as long as Azure logs are already flowing into Datadog via the Azure integration. In mixed environments where some services use App Insights and others use Datadog APM, traces from both can be correlated in a single view. The Azure Application Insights integration is currently in Preview.

Visualize all of your services without instrumentation using Datadog Service Discovery

Not sure what your environment looks like at the application layer, or whether you’re getting the most out of your observability setup? Datadog Service Discovery gives you an instant preview of every service running across your hosts and how they connect. Service Discovery generates a visual map of your application stack that requires no instrumentation. This helps you understand the full scope of what could be monitored with APM, including information like which services exist, how they depend on each other, and which services are critical and should be monitored. Service Discovery is a starting point for building a more complete observability strategy. To get started with Service Discovery, sign up for the Preview.

Investigate trace behavior at scale with Trace Patterns

Investigating one trace at a time does not scale. Trace Patterns groups traces with similar structure and attributes into recurring patterns, ranked by request volume, error rate, and latency, so you can analyze behaviors across requests at a glance. Open any pattern to inspect representative traces, outliers on errors or latency, and how it changes over time. Learn more in the Trace Patterns documentation, or sign up for the Preview to get started.

Track performance across every user journey

Monitor the technical performance of critical steps in your user journeys with RUM Operations

Every user journey in your applications, such as checkout, login, or search, includes critical steps that make the experience work. These steps are monitored through RUM Operations to help ensure your journeys are always available. For example, the checkout journey may include operations steps such as entering payment details, saving a payment method, and completing a purchase.

Once a RUM Operation is defined, Datadog calculates metrics over your application’s full traffic to measure the operation’s volume, conversion rate, and latency. These metrics can be plugged into monitors, SLOs, and dashboards, and Operations also appear as RUM events within RUM sessions for deeper investigation. Learn more in our Operations Monitoring documentation, or sign up for the Preview to get started.

Bring observability into every release with Feature Flags and Experiments

Datadog Feature Flags and Experiments are both now generally available, and plug directly into the telemetry data you already collect with Datadog: APM traces, RUM sessions, logs, and infrastructure metrics. With Feature Flags, engineering teams can ship features through observability-driven canary rollouts, trace any incident back to the exact flag change that caused it, and let Bits AI clear stale flags before they pile up as tech debt.

Experiments uses that same telemetry data to make every release measurable, so teams can run rigorous A/B tests and see how each variant affects user behavior, application performance, and business metrics in one view, with no batch pipelines or stitched-together dashboards. And as AI agents take on more development work, Experiments lets teams safely test every change they ship, keeping reliability and key metrics in check even as cycles speed up. Together, they connect product insight, controlled testing, and safe production rollout in one workflow. Learn more in our blog post.

See system health at a glance with the new Synthetics experience

Stop being the last to know when your core journeys break. The Datadog Synthetic Monitoring landing page provides a unified view of application health, replacing crowded test lists with actionable insights. Use the Availability Overview map to visualize your highest-traffic routes and identify coverage gaps. Additionally, you can use the System Signals view to ensure that you catch issues across your stack before they impact users. From tracking SLIs to automating maintenance for “noisy” tests, this is your new daily routing mechanism for production peace of mind. Sign up for the Preview to get early access to the new Synthetics landing page. To learn more, you can check out our blog post on the landing page and read the Synthetic Monitoring documentation.

Troubleshoot frontend performance with Datadog’s Browser Profiler

Frontend performance issues are easy to detect but difficult to diagnose. A degraded INP score or recurring long tasks tell you users are experiencing slowdowns, but not which JavaScript function is responsible. Datadog’s Browser Profiler, now in public preview, connects method stack frames from real user sessions directly to the RUM workflows engineers already use. Teams can investigate slow interactions in individual sessions, identify recurring bottlenecks across thousands of sessions, and compare profiling snapshots before and after a deployment to confirm a fix worked in production. Learn more in our blog post.

Optimize the speed of your mobile application launches with Mobile Profiling

Capture detailed data about your mobile application’s performance during launch. Using Mobile Profiling, you can identify slow methods and optimize startup time for your application’s time to initial display (TTID). The mobile profiler collects method call stacks from the application’s process, which can be queried and analyzed in the RUM Sessions Explorer. Mobile Profiling is available in Preview for iOS and Android.

Monitor the reliability of every critical test suite in one place

Without a clear reliability signal at the suite level, teams are forced to investigate individual test failures one by one, making it difficult to understand whether issues are isolated noise or symptoms of broader system degradation. Datadog automatically generated SLOs for Test Suites bridges that gap by transforming grouped Synthetic tests into a unified reliability view. This helps teams quickly assess system health, track error budget consumption, and prioritize investigations where they matter most.

With no setup required, Datadog automatically creates SLOs for every test suite, providing a default 7-day rolling reliability KPI with a 99.9% target. Teams can immediately understand whether reliability is trending in the wrong direction, alert on meaningful degradation instead of transient failures, and identify which tests are contributing most to downtime through a built-in contributors view. By surfacing the tests driving error budget consumption, Test Suite SLOs help SREs, platform engineers, and QA teams move from fragmented troubleshooting to faster, more focused root cause analysis. For more information, see Service Level Objectives for test suites.

Detect and resolve network issues at every step

Diagnose fleet-wide endpoint issues automatically with Datadog

The new Command Center feature in Datadog End User Device Monitoring automatically detects and investigates fleet-wide endpoint problems using Bits AI SRE. Each issue card surfaces the root cause, affected device count, and full investigation trail. Command Center launches with coverage for nine high-frequency scenarios across network and SaaS performance, device and application health, and AI tool usage and visibility. Because Command Center is built on top of Case Management, admins can update status, assignee, and linked Jira tickets without ever leaving the page. To get started, join the End User Device Monitoring Preview or read the documentation.

Trace network paths from an end user device to a SaaS application

When users report slow applications or degraded call quality, you can’t often tell where the problem originates. Switching between separate tools to correlate device, network, and application signals makes root cause analysis slow and imprecise. By combining Datadog End User Device Monitoring with Network Path, you can now trace the full network path from a user’s device to a SaaS application, visualizing per-hop latency and packet loss across every layer. You can compare paths across devices and time periods to identify trends that may be affecting the wider fleet. To get started, join the End User Device Monitoring Preview or read the blog post.

Network Device Monitoring adds integrations for Meraki, Fortinet, VeloCloud, Aruba, and Juniper Mist

Datadog Network Device Monitoring now covers Cisco Meraki, Fortinet FortiManager, VMware VeloCloud SD-WAN, Aruba Central, and Juniper Mist, five of the leading platforms running modern enterprise networks. This enables you to collect link quality, device health, and traffic breakdowns across cloud-managed wireless, SD-WAN, and AI-driven networking in one place. Meraki security event logs also flow into Datadog Cloud SIEM, so you can investigate threat activity and performance issues side by side. The result is vendor-agnostic visibility from edge to core, whether your fleet is single-vendor today or mid-migration to multi-vendor next quarter. Learn more about Network Device Monitoring.

Monitor cloud-managed wireless infrastructure with Aruba Central and Juniper Mist integrations

Datadog’s Aruba Central and Juniper Mist integrations are now generally available, bringing cloud-managed wireless and wired network infrastructure into Network Device Monitoring. Teams can monitor device health, client experience, Wi-Fi quality metrics, and network throughput across Aruba- and Mist-managed access points, switches, and gateways, all through API-based collection. Because these integrations feed into the broader Datadog platform, network engineers can correlate wireless performance degradation with application latency, infrastructure metrics, and logs to determine whether connectivity issues originate in the network layer or elsewhere in the stack. Learn more about the Juniper Mist integration and the Aruba Central integration, or explore Network Device Monitoring to get started.

Monitor AI infrastructure and modern cloud platforms

Monitor Databricks SQL warehouses with Data Observability

You can now use Datadog Data Observability to get visibility into your Databricks SQL warehouses. With Data Observability’s Databricks SQL warehouse monitoring, now in public preview, you can detect failed and long-running Databricks queries across workspaces in near real-time, reducing time to identify and fix broken analytics workloads or catch overly expensive queries. You can also monitor usage and queued queries across your SQL Warehouse to determine if cluster configuration changes are needed to ensure critical queries run on-time. To get started, follow the Data Observability for Databricks setup documentation.

Monitor Nebius AI Cloud workloads with Datadog

ML and platform teams use Nebius AI Cloud to train and deploy AI models, with GPU compute, training jobs, inference services, and LLM application telemetry data spread across disconnected tools. The Datadog integration for Nebius AI Cloud brings VM serial output, Managed Kubernetes, MLflow, PostgreSQL, and AI endpoint logs into Datadog Log Management, deploys the Datadog Agent on Nebius compute for infrastructure metrics and APM, monitors GPU utilization and thermals with Datadog GPU Monitoring, and traces agent workflows and token usage with Datadog Agent Observability. An out-of-the-box dashboard and prebuilt monitors cover common AI workload failure modes, from MLflow experiment errors to PostgreSQL connection failures. Read our documentation to get started or check out our blog post.

Monitor Google Cloud Run Jobs end-to-end with Datadog Serverless Monitoring

Cloud Run Jobs handle workloads like batch data pipelines, ML preprocessing, and nightly reports, but without deep observability, a failed or slow job means manually scraping Cloud Logging to understand what went wrong and where. Datadog Serverless Monitoring for Cloud Run Jobs brings full APM tracing, metrics, and log collection to your job executions, with support across Python, Node.js, Go, Java, .NET, Ruby, and PHP. Every job execution is traced end-to-end and correlated with the infrastructure and services your job depends on, so you can see exactly which step ran long, where errors occurred, and how performance compares across executions. Serverless Monitoring for Cloud Run Jobs is currently in Preview and will reach general availability soon. To get started, request access or read our documentation.

Monitor Vercel functions with Datadog Serverless Monitoring

Teams using Vercel can now get complete visibility into their functions by sending OpenTelemetry logs and traces directly to Datadog via the Vercel Drains configured in the Vercel integration, with no custom pipeline or additional tooling required. Once connected, every Vercel project gets a dedicated page in the Serverless view organized by route, with tabs for Overview, Logs, Traces, and RUM so engineers can go from a spike in function errors to the exact trace and correlated log in seconds. An out-of-the-box dashboard surfaces traffic, latency, serverless function health, firewall events, and cache hit ratios across your entire Vercel deployment. Read our documentation to get started.

Monitor Azure AI Foundry with the Datadog integration

Azure AI Foundry has quickly become a default platform for enterprise teams deploying models, prompt flows, and agent workloads on Azure. The new Datadog integration brings Foundry metrics and logs into Datadog with out-of-the-box dashboards and recommended monitors covering model performance, activity, and cost. Foundry telemetry sits alongside the rest of your Azure stack, so your platform team manages it from the same view they already use for everything else. To get started, enable the Azure AI Foundry integration tile in Datadog.

Track n8n agentic workflows end to end with Datadog

n8n is a workflow automation and orchestration platform that teams use to integrate systems and automate data pipelines. Datadog’s n8n integration brings visibility into workflow health alongside the rest of your infrastructure in one place. With Datadog, you can monitor workflow execution counts and status, latency percentiles, queue health, worker capacity, webhook throughput, and step-level timing. That means you can quickly understand when workflows are delayed, which node is causing the slowdown, and whether the root cause is in the workflow itself or the underlying infrastructure, all without ingesting every execution log just to reconstruct what happened. Datadog’s out-of-the-box dashboards and monitors enable you to visualize and alert on failures, investigate slowdowns, and correlate workflow behavior with worker health, queue pressure, and Kubernetes context. Read our documentation to learn more about the n8n integration.

Get end-to-end Nutanix visibility with Datadog

Nutanix is a hyperconverged infrastructure platform that combines compute, storage, and virtualization in a single software-defined stack. Datadog’s Nutanix integration gives teams visibility into clusters, hosts, and VMs while also bringing Prism Central operational activity, including alerts, events, tasks, and audits, into Datadog as events. This helps teams monitor Nutanix infrastructure alongside the applications running on it; quickly determine whether issues start in the app layer or the underlying platform; and investigate cluster health, capacity, storage and I/O performance, host and VM hotspots, and inefficient workloads. It also includes an out-of-the-box Nutanix Overview dashboard that provides a baseline view of health status, resource usage, and capacity insights so operators can move from symptoms to causes faster and keep environments running smoothly as workloads change. To learn more, read our blog post and documentation.

Get visibility into your entire enterprise ecosystem

Integrate with the platforms your business applications run on

New and enhanced integrations with Temporal Cloud, Adyen, ServiceNow, Cloudflare, SAP HANA Cloud, Tableau, Shopify, Intercom, and Genesys Cloud extend Datadog into the SaaS platforms that are running modern businesses. Coverage now spans workflow orchestration, payment processing, ITSM, edge networking, business intelligence, ecommerce, customer support, and contact centers, including first-to-market support for Temporal’s new OpenMetrics API. You can track the full transaction life cycle in Adyen, workflow execution in Temporal, and storefront health in Shopify alongside the applications and infrastructure you already monitor. Observability follows your stack instead of the other way around. Learn more in the Datadog integrations documentation.

Deploy Azure automated log forwarding with Terraform

Datadog’s automated log forwarding for Azure already eliminates the need to manually set up, configure, and manage the services and diagnostic settings needed to forward logs. Automated log forwarding now supports Terraform, so you can provision the full pipeline across every subscription in your tenant directly from your infrastructure as code. Add the module once, and adding a new subscription becomes a one-line config change instead of a portal workflow. Coverage stays in sync with the rest of your Azure infrastructure, eliminating the drift that comes with manual setup. To get started, install the Datadog Terraform provider and add the automated log forwarding module to your config.

Monitor Oracle Fusion Cloud Applications with Datadog

Oracle Fusion Cloud Applications power critical business workflows across finance, HR, and supply chain, but because they run on Oracle-managed infrastructure, engineering teams have had limited visibility into their performance. The new Datadog Oracle Fusion integration closes that gap by collecting ESS job metrics and logs so teams can track job execution, detect retries and stalls, and correlate slowdowns with downstream pipeline failures in Oracle Integration Cloud. Audit logs flow directly into Log Explorer and Cloud SIEM, enabling real-time alerts on high-risk activity like permission changes. Combined with Synthetic Monitoring, teams can test Oracle Fusion endpoints and UI workflows from the outside in, catching regressions before users report them. To learn more, read the blog post or see the Oracle Fusion Applications integration documentation.

Unify observability for Alibaba Cloud with Datadog

For teams running Alibaba Cloud alongside AWS, Google Cloud, or Azure, signals from Cloud Monitor, ApsaraDB, and Simple Log Service stay siloed in their own consoles, making cross-provider incidents difficult to diagnose. The Datadog Alibaba Cloud integration brings 14 Alibaba Cloud services into one platform. Pull infrastructure metrics from Cloud Monitor and ship logs from Simple Log Service into Datadog Log Management, including ActionTrail audit events, ACK Kubernetes logs, OSS access logs, and VPC Flow logs. Install the Datadog Agent on ECS instances and ACK clusters to add distributed traces and container metrics. Out-of-the-box dashboards for ECS, CDN, Server Load Balancer, and ApsaraDB databases load automatically once configured. For teams with APAC data residency requirements, BYOC Logs keeps log processing inside your own Alibaba Cloud account. Read our blog post or documentation to get started.

Monitor OVHcloud infrastructure with Datadog

Finance, healthcare, and public administration teams that need EU data residency use OVHcloud alongside AWS, Google Cloud, or Azure, where telemetry data sits in separate tools and cross-cloud investigations lose context. The Datadog OVHcloud integration pulls logs from OVHcloud Logs Data Platform into Datadog Log Management, including account audit logs, IAM policy verification results, Kubernetes audit logs, managed database logs, and load balancer access logs. Install the Datadog Agent on OVHcloud instances to add host metrics, APM traces, and container telemetry data. An out-of-the-box dashboard and three prebuilt monitor templates, firing on HTTP error rate spikes, elevated error log counts, and a high count of critical severity logs, give you coverage without defining every condition from scratch. Read our documentation to get started or check out our blog post.

Monitor Scaleway logs and infrastructure with Datadog

Teams in regulated industries use Scaleway alongside AWS, Google Cloud, or Azure for EU data residency and GDPR compliance, but cannot correlate Scaleway telemetry data with the rest of their stack without switching tools. The Datadog Scaleway integration forwards logs from Scaleway Cockpit and Audit Trail into Datadog Log Management, deploys the Datadog Agent on Scaleway Compute instances for host metrics and APM traces, and correlates APM traces with pod-level metrics from Scaleway Kapsule and Kosmos Kubernetes clusters. An out-of-the-box overview dashboard and two prebuilt monitor templates, one for spikes in service error logs and one for critical events in your Scaleway environment, give on-call engineers a starting point before workload baselines are established. To learn more, read our documentation or check out our blog post.

Track Power BI Embedded performance and resource utilization

Power BI Embedded lets developers ship Power BI analytics inside their own applications, but operating the underlying capacity in production requires visibility into refresh performance, query latency, and capacity utilization that the Azure portal alone does not provide. The new Datadog integration for Power BI Embedded surfaces critical performance and utilization metrics directly in Datadog. Power BI Embedded health lives in the same observability platform as the rest of your applications, so capacity admins manage it with the alerting and dashboard workflows their team already uses. For more information, read our documentation.
Original source
Jun 9, 2026
Date parsed from source:
Jun 9, 2026

First seen by Releasebot:
Jun 26, 2026
Datadog

DASH 2026 Operating at Scale: Guide to Datadog’s newest announcements

Datadog expands operating at scale with new tools for disaster recovery, AI and cloud cost control, organization management, and incident workflow automation, while also adding live Cloudcraft diagrams, postmortem tracking, and planned maintenance communication to help teams stay reliable and efficient.
A challenge for many teams continues to be managing cost, governance, and reliability across an ever-larger footprint. This year’s DASH announcements help teams operate efficiently at scale, with new tools to cut cloud and AI spend, eliminate waste automatically, maintain observability during outages, and manage many organizations and agents as a single unit.

Whether you’re attributing AI spend across providers, automating cost optimization within guardrails you define, keeping observability online through a cloud outage with Disaster Recovery, or storing and searching logs at petabyte scale in your own infrastructure, these features help you control complexity and cost without slowing your teams down. Review everything new for operating at scale below, and read our other roundup posts for the latest in AI, observability, and security.

Run Datadog reliably at scale

Maintain observability during cloud outages with Datadog Disaster Recovery

Cloud provider outages can leave teams without visibility into production systems during active incidents. Datadog Disaster Recovery (DDR) lets you configure a secondary Datadog site ahead of time, automatically replicates more than 30 resource types, including dashboards, monitors, and users on a regular schedule, and activates on demand when your primary site is impacted. Failover can be triggered via Fleet Automation and Remote Configuration for Agent-based cutover, or via a dedicated DNS intake endpoint that routes traffic without changes to your Agent fleet. DDR is now generally available. To enable DDR for your organization, contact your Datadog account manager, or read the blog post to learn more.

Minimize the effort of keeping SDKs up to date with Remote SDK Upgrades

Remote SDK Upgrades in Fleet Automation make it easy to keep Datadog SDKs up to date across your fleet of services. Using the latest SDKs ensures that you benefit from the latest features, performance improvements, and security updates. Learn more in our Remote Agent Management documentation, or sign up for the Preview to get started.

Manage multiple Datadog organizations as a single unit with Organization Groups

Organization Groups lets administrators manage multiple Datadog organizations as a single unit. Instead of configuring roles, policies, and settings individually per organization, administrators define them once at the group level and push them to member organizations.

Organization Groups are in Preview. Sign up to request access. Learn more in our documentation, or see our guide on organization topologies.

Understand the health of your Oracle infrastructure with live diagrams in Cloudcraft

When you’re responding to an incident or doing day-to-day governance in unfamiliar or poorly documented parts of your infrastructure, you often need to know what connects to what. Cloudcraft Oracle diagrams show your live infrastructure and architecture, tightly integrated with Datadog observability and security tools. This helps you:

See an incident’s blast radius with alerts and monitors on your live infrastructure diagram

Find gaps in observability coverage where the Datadog agent is not installed (but should be)

Optimize costs by finding over-provisioned resources and figuring out who owns them

Analyze which security misconfigurations are most relevant and need to be addressed

Onboard new team members

Cloudcraft in Datadog is free for all Datadog customers. To get started, visit Cloudcraft in Datadog today.

Visualize on-prem cluster issues with live VMWare vSphere diagrams in Cloudcraft

When you’re managing VMWare clusters, you often need to understand blast radius of an issue: Is it isolated, or part of a broader problem? Does a VM have a noisy neighbor, or is a host or cluster exhausting its resources? Cloudcraft VMWare diagrams show your live vSphere clusters, tightly integrated with Datadog observability and security tools. This helps you:

See an incident’s blast radius with alerts and monitors on your live cluster diagram

Quickly click on a host or VM to get detailed telemetry (logs, metrics, traces, network traffic, and more) to find the root cause of an issue

Cloudcraft in Datadog is free for all Datadog customers. To get started, visit Cloudcraft in Datadog today.

Cut cloud costs and eliminate waste

Proactively track and attribute AI spend across providers with Cloud Cost Management

As organizations adopt more AI providers, costs become harder to track and even harder to attribute. Datadog Cloud Cost Management now brings AI spend across Anthropic, OpenAI, Amazon Bedrock, Google Gemini, Vertex AI, and GitHub Copilot into a single destination, alongside your existing cloud infrastructure costs. Consistent tags like model, project, and token type let you compare spend across providers, while out-of-the-box allocation rules automatically attribute Anthropic and OpenAI costs to the API keys and users driving them. From there, you can roll up usage to the teams, services, or business units accountable for it to build executive-ready reports and dashboards. Cost monitors and anomaly detection catch spikes before they show up on the bill, and pairing AI cost data with Datadog metrics turns raw spend into unit economics like cost per user. To learn more, read the AI Costs blog post and check out the AI Costs documentation.

Reduce infrastructure spending faster with CCM Cost Optimization Automation

Cost optimization recommendations are easy to surface but hard to implement: Acting on them requires FinOps, SRE, and engineering to coordinate manual cleanup work against higher-priority roadmaps, so most opportunities never get off the backlog. Cost Optimization Automation in Datadog Cloud Cost Management closes that gap by continuously executing approved recommendations on your behalf. This enables you to turn recommendations into realized savings in a matter of hours, without consuming an engineering cycle. Create automations scoped by resource type, AWS account, region, and other tags. Then, set a cadence that fits your change windows, and connect the AWS environments you want in scope. Datadog runs every automation inside guardrails—pre-delete snapshots, IOPS feasibility checks, human-in-the-loop approval in Slack or Teams, and a complete audit trail of every change and execution—so every change is visible, reviewable, and under your control.

Cost Optimization Automation is generally available today for unattached EBS volumes, unused RDS instances, S3 Intelligent Tiering, CloudWatch Logs retention, DynamoDB backups, and unused EBS snapshots, with more recommendation types and provider coverage on the way. To learn more, visit our documentation.

Rightsize Karpenter nodes with performance-based recommendations

Datadog Cluster Autoscaling runs performance-informed simulations of your workloads to generate cost-saving instance type recommendations for open source node autoscaling solutions such as Karpenter. Cluster Autoscaling tackles overprovisioning by grounding recommendations in your actual workload performance, enabling you to reduce wasted capacity by identifying cluster idle spend, impacted workloads, and drifted autoscaler configurations. You can compound these savings by using Spot instances safely with interruption predictions to significantly reduce risk. Learn more in our Cluster Autoscaling documentation or sign up for the Spot Instance Management Preview to get started.

Streamline incident and request workflows end to end

Start your day with the IDP Homepage

Engineers rely on many systems to prioritize their daily work; each day might start with checking pull requests, tickets, CI/CD failures, on-call handoffs, and service health. Each system provides useful context, but the work of turning signals into a clear plan often falls on the individual engineer. The IDP Homepage gives engineers a central starting point inside Datadog that brings together code changes, ownership context, and operational signals so they can move directly from “What should I check?” to “What should I do next?” Teams can also extend the homepage with custom apps built using App Builder or Datadog Apps, making it easy to incorporate internal tools and workflows that native integrations don’t cover. Read our blog post to learn more.

Automate request workflows with Datadog Forms and Case Management

Datadog Forms and Case Management help teams manage incoming requests by connecting structured intake forms directly to operational case tracking. Teams can create forms for workflows such as IT access tickets, customer bug reports, and vulnerability disclosures and share them with Datadog users and external submitters. When a form is submitted, Datadog automatically creates a case populated with the required context so teams can begin triage and resolution with the information they need. Forms support conditional logic and required fields, while Case Management provides assignment, prioritization, notification, and workflow automation capabilities. Together, Forms and Case Management help teams centralize request intake, improve visibility into request trends, and spend less time chasing missing information.

To learn how Forms and Case Management simplify request workflows from intake to resolution, you can read our blog post or check out the documentation.

View handover automations in Microsoft Teams and Slack

On-call shift changes are moments of high risk. If the handover doesn’t happen clearly, context gets lost and the incoming responder starts cold. Handover automations run actions automatically when shifts change, replacing manual updates like posting in Slack or updating channel topics. Configure per team: post a handover summary to a channel, update the channel topic with the incoming responder’s name, send them a direct message, or sync a Slack user group. Works with Slack, Microsoft Teams, and Datadog Workflow Automation. Learn more in the handover automation documentation.

Track postmortem completion and ownership for continuous improvement

To ensure continuous improvement, post-incident work must be tracked and owned. You can now set a postmortem’s status to Draft, In Review, or Completed directly from the Post-Incident tab or from the incident Slack channel. You can also assign a dedicated postmortem owner, who can be the Incident Commander, to drive the review process to completion. All of this life cycle and ownership data is exposed as Incident facets, which lets engineering leadership easily report on postmortem coverage across the organization, such as by calculating the percentage of SEV-1 incidents with completed postmortems. Learn how to incorporate postmortem data for better reliability reporting on our Incident Postmortems documentation.

Capture on-call knowledge at the end of every shift with On-Call Recall

On-call knowledge can get lost at the end of every shift: which monitors are flappy, what fixed that 2 a.m. page, which alerts are safe to ignore. On-Call Recall automatically generates a shift summary at the end of every rotation, pulling each page, its monitor, the responder’s actions, and any linked incident into one place. Every page gets a machine-generated verdict (Actionable, Noise, Repeat, Unknown, or Escalated) so the next responder sees what to pay attention to, not just what fired. Repeat detection surfaces what was learned the last time the same monitor paged so engineers stop rediscovering the same fix at 3 a.m. To get started, request access to the Preview.

Track cross-incident follow-ups in a dedicated view

Follow-up tasks created during incidents have historically been buried inside individual incident records, invisible to anyone managing remediation across the organization. A new cross-incident follow-up view at the top level of Incident Management surfaces all open and completed tasks across every incident, filterable by assignee, team, severity, and date. Combined with follow-up analytics, engineering leads can track completion rates, identify recurring gaps, and measure whether remediation work is actually reducing recurrence over time. Learn more in the incident follow-ups documentation.

Auto-post and sync Microsoft Teams meeting links in incident channels

Microsoft Teams meeting links are now automatically posted and kept up to date in your incident channel so responders always have the right link without hunting for it mid-incident. When automatic channel and meeting creation are both enabled, the meeting link appears in the onboarding message the moment an incident is declared. The channel is also notified when a meeting is manually created or updated through the Datadog UI. Find out more in our Microsoft Teams and Datadog Incident Management integration documentation.

Run your incident without leaving chat with new Slack action tray and slash commands

Managing an incident from Slack used to mean memorizing slash commands and hoping you typed them correctly under pressure. The updated Slack action tray surfaces all relevant incident actions the moment you join an incident channel, or on demand by using /datadog, removing friction between you and the action that needs to happen. Update severity, add responders, acknowledge pages, post status updates, and access related past incidents and observability context, all without leaving Slack. To learn more, read our Incident Management Slack integration documentation.

Keep pages in sync with ServiceNow and Jira integrations

Enterprise teams shouldn’t have to choose between their ITSM workflow and their incident response tooling. Datadog Incident Management now keeps incidents in sync with ServiceNow and Jira. ServiceNow record IDs can replace Datadog incident keys as the display identifier, custom fields map directly to ServiceNow Configuration Items, and incident follow-ups export as bi-synced cases that stay in lockstep with Jira tickets. The result is a single source of truth across your incident and ITSM systems without manual duplication. Learn more in the Incident Management and ServiceNow integration and Incident Management and Jira integration documentation.

Schedule and communicate planned downtime with Maintenance Windows

You can now schedule and communicate planned downtime directly from Datadog Status Pages, keeping your stakeholders informed before maintenance begins.

With Maintenance Windows, you can now:

Schedule planned downtime with title, description, time window, and impacted components

Display a notice on your status page so users see upcoming maintenance before work begins

Notify subscribers automatically when maintenance is scheduled, starts, and completes

To get started, visit the documentation.
Original source
Jun 9, 2026
Date parsed from source:
Jun 9, 2026

First seen by Releasebot:
Jun 26, 2026
Datadog

DASH 2026 Security & Compliance: Guide to Datadog’s newest announcements

Datadog expands its security and compliance platform with AI-assisted investigation and remediation, broader SIEM coverage, new code, API, and sensitive data fixes, modernized authentication, and FedRAMP High-certified observability for sensitive workloads.

Securing modern systems means spotting real risk in a sea of findings, investigating threats faster, and meeting compliance demands without bolting on separate tools.

At this year’s DASH, we made announcements that bring AI-assisted investigation and remediation across the security life cycle—from code and cloud to APIs and sensitive data—alongside expanded SIEM coverage and a modernized approach to authentication and governance.

With Datadog, you can now automate threat hunts and SIEM investigations with Bits agents, fix vulnerable dependencies directly from a finding, and secure your most sensitive workloads with FedRAMP High–certified observability. These features and many others help teams stay ahead of threats while keeping security context in familiar workflow territory. Explore everything new in security and compliance below, and see our other roundup posts for the latest in AI, observability, and scale.

Prioritize and route real risk with Datadog Security

Let agents run day-to-day security operations with the MCP Security toolset

The Datadog Security MCP toolset enables AI agents such as Claude Code, OpenAI Codex, and Cursor to securely access Datadog security context through the remote Datadog MCP Server. After launching with read-only capabilities earlier this year, the Security MCP toolset now includes expanded tools for SQL-powered reads, detection rule management, suppressions, triage, and ticketing workflows. With these new capabilities, teams can bring AI-assisted investigation and remediation into their existing security operations. Agents can help surface relevant context, prioritize what needs attention, and take governed actions while Datadog remains the source of truth for security data, detections, and controls. Read our Security MCP Tools documentation to get started.

Route security findings notifications to the right team automatically

Security findings are most actionable when they reach the team that owns the affected service, resource, or repository. With dynamic routing for security notification rules, Datadog can automatically send finding notifications to the Slack or Microsoft Teams channel configured for the associated Datadog Team. This helps security teams reduce manual notification setup and keep routing accurate as their organizations evolve. Instead of updating individual rules every time ownership changes, teams can manage notification channels centrally in Datadog Teams. If a finding is missing ownership information or the team does not have a notification channel configured, Datadog can route the notification to a fallback channel so important issues are still surfaced. Learn more in the Dynamic routing documentation, or sign up for the Preview to get started.

Automated bidirectional ticket creation for security findings

Security teams can now set up automation rules that create tickets whenever a new security finding matches defined criteria. Tickets can be created in Datadog Case Management or Jira, with bidirectional sync to keep Datadog and downstream ticketing systems aligned as work progresses. This makes it easier to route remediation work into the systems that engineering teams already use while preserving security context in Datadog. This release complements previous ticketing capabilities, including new public APIs to create and manage tickets, and Workflow Automation actions for teams that need more advanced ticketing flows. Learn more in our Ticket Creation Rules documentation.

Detect and investigate threats faster with Cloud SIEM

Get deeper coverage across your stack with security integrations in Datadog

Since last DASH, we’ve added 30+ security integrations spanning SIEM log sources and threat intelligence feeds, expanding Datadog’s ability to ingest, correlate, and act on security data across the tools your SOC already relies on. Ingest Jamf Pro device compliance, policy enforcement, and inventory data into Datadog Cloud SIEM so security teams can detect unmanaged or non-compliant endpoints, correlate device posture with security events, and close the gap between endpoint management and threat investigation. Stream Box enterprise event logs to detect anomalous data access patterns, flag unauthorized sharing, and give security teams full audit trail visibility. Ingest logs from both Zscaler Internet Access and Zscaler Private Access to correlate web traffic threats, policy violations, and private app access events in a single view. Bring firewall logs, threat detections, and traffic analytics from Barracuda SecureEdge and CloudGen Firewall into Datadog Cloud SIEM so teams can correlate network-layer security events with broader infrastructure and application signals. Integrate Recorded Future threat intelligence feeds into Datadog to automatically enrich security events with real-time context on indicators of compromise, threat actors, and vulnerability risk. Learn more about Datadog’s security integrations in our documentation.

Automate threat hunting with Datadog Cloud SIEM

Bits Threat Hunting is an autonomous agent that runs hypothesis-driven threat hunts across your environment. It reasons with your telemetry—logs, network flows, identity events, and endpoint activity—to surface patterns consistent with known attacker behaviors, emerging threat campaigns, and unusual deviations from your baseline. From there, it can catch indicators of compromise (IoCs) and tactics, techniques, and procedures (TTPs) that existing SIEM detection rules don’t yet cover. It can also recommend and deploy detection rules for IoCs and TTPs based on its available threat intelligence and threat hunt findings. Bits Threat Hunting is available in Preview. Sign up to get started or read our blog post.

Automate investigations on any SIEM with the standalone Bits Security Analyst

Bits Security Analyst is an always-on SOC analyst built to investigate complex threats and triage security alerts. It autonomously investigates alerts and creates actionable reports in minutes, following security investigation best practices. SOC teams can spend less time on false positives and benign activities and focus on real threats. Bits Security Analyst is now available as a standalone solution that you can deploy in popular SIEMs—including Splunk and Microsoft Sentinel—in minutes, delivering value from day one without disrupting existing workflows. Standalone Bits Security Analyst is available in Preview. Sign up now to start automating your investigations on any SIEM.

Monitor Claude Compliance API activity with Datadog Cloud SIEM

Security and compliance teams need a scalable way to monitor AI activity, investigate suspicious behavior, and maintain audit-ready records. Datadog integrates with the Claude Compliance API, bringing Claude activity into Cloud SIEM, including sign-ins, admin API key life cycle events, organization membership changes, SSO/SAML configuration updates, and Claude chat and project access events. Out-of-the-box dashboards and detection rules help teams identify suspicious patterns, validate administrative actions against change records, and investigate AI-related activity alongside the rest of their security telemetry data. To learn more, read our Claude Compliance API integration blog post.

Remediate code and cloud risks faster with Code and Cloud Security

Fix vulnerable dependencies and misconfigurations faster with Bits Code in Datadog Code Security

Bits Code helps engineers remediate Code Security SAST findings by generating pull requests with code fixes. Now, this capability extends to Software Composition Analysis (SCA) and Infrastructure as Code (IaC) findings. Bits Code proposes targeted updates for vulnerable dependencies and misconfigured infrastructure, creating the exact change needed and showing you the reasoning behind it. Teams can apply remediations as single fixes or as batch actions that resolve many findings at once, and engineers review, refine, and merge fixes directly from their existing workflow. This expansion gives security and platform teams a faster path to resolving issues across application code, open source libraries, and cloud infrastructure without leaving Datadog. Learn more in our dedicated blog post.

Detect risks in your code more accurately with AI-native SAST in Datadog Code Security

Datadog Code Security now includes built-in AI-native SAST capabilities in public Preview, using LLMs to reason about code semantics, call stacks, and data flow, delivering context-aware vulnerability detection. On OWASP benchmarks, it outperforms traditional SAST across nearly every category, with true positive rates up to three times higher for context-dependent issues like SQL and command injection. Incremental analysis keeps scans fast and cost-effective, and each finding includes a clear exploit explanation and suggested fix. Learn more in our dedicated blog post.

Detect source code attacks with Datadog Code Threat Detection

Datadog Code Threat Detection helps engineering teams catch malicious code changes before merging. Developed in partnership with Datadog Security Research, it automatically analyzes every pull request for threats that traditional vulnerability scanners miss, including supply chain attacks, suspicious dependencies, and obfuscated code. Reviewers get clear, contextual findings directly in their workflow, and each flagged change includes an explanation of why it was flagged and recommended next steps. By surfacing risks at review time, teams can stop malicious code from reaching production. Code Threat Detection is now available for Datadog Code Security customers. To request early access, sign up for the Preview. For more information, read our blog post.

Evaluate code risks with confidence using Bits Assessments in Datadog Code Security

Available for Code Security customers, Bits Assessments reduce noise in static code analysis by classifying SAST findings as likely true or false positives, so your team can focus on the vulnerabilities that actually matter. Each evaluation includes a confidence score and a short reason citing the relevant code context, helping developers trust the verdict. Findings flagged as false positives can be automatically filtered from PR comments and PR gates, keeping pull requests clean without blocking valid fixes. Bits Assessments also learns from your team’s past false positive reports, using them as context to improve future classifications over time. Learn more in our documentation.

Detect and block malicious open source packages with Supply Chain Firewall

Supply Chain Firewall (SCFW) is an open source CLI tool from Datadog Security Research that blocks malicious and vulnerable open source packages before they install. It supports npm, pip, and Poetry, checking every dependency against Datadog’s malicious packages dataset and OSV.dev. Known malicious packages are blocked outright, and vulnerable packages prompt for user confirmation. Datadog customers can now forward SCFW activity to their Datadog account through a local Agent or the HTTP API, giving security teams visibility into developer install activity alongside their existing telemetry. You can learn more about SCFW in our dedicated blog post; to set up the integration, read our documentation.

Identify security posture risks in Oracle Cloud Infrastructure with Datadog Cloud Security

Maintaining consistent security posture across a multi-cloud footprint can be challenging. Often, security teams are left with fragmented visibility and manual compliance checks across different providers and tools. To help you secure your entire infrastructure in one place, Datadog Cloud Security now supports Oracle Cloud Infrastructure (OCI), expanding our coverage across all four major cloud providers. Now, you can automate compliance monitoring using 45 out-of-the-box rules mapped to the CIS OCI 3.0.0 benchmark to identify and remediate misconfigurations instantly; build custom security logic by writing tailored Rego rules against 40 different OCI resource types to meet your organization’s specific security requirements; and gain unified visibility into multi-cloud risk by analyzing OCI security findings alongside AWS, Azure, and Google Cloud data within the Cloud Security Summary, Findings, and Compliance pages, and via the API. Set up OCI in Cloud Security to get started, and read more about securing OCI infrastructure in our blog.

Secure your APIs from finding to fix with App and API Protection

Remediate API security issues directly from findings with Bits Code

Bits Code enables backend engineers to take immediate action on API findings by generating pull requests with concrete fixes. Leveraging production signals and source code context, it streamlines remediation and reduces manual effort, helping teams resolve issues faster without leaving Datadog. Learn more in our Bits Code documentation.

Proactively validate your API attack surface with API Security Testing

API Security Testing brings active validation to API security by continuously testing endpoints for OWASP API risks. This helps teams uncover vulnerabilities and misconfigurations that passive monitoring may miss, and turns API inventory into actionable security findings, enabling faster remediation and stronger, more reliable API posture. To get started, request access to the Preview.

Find and fix data leaks at the source with Sensitive Data Scanner

Keep sensitive data out of your logs with Bits Code and Sensitive Data Scanner

Redaction and access controls reduce exposure to sensitive data leaks, but only a code change eliminates them. Now, you can launch a Bits Code coding session from any finding in Datadog Sensitive Data Scanner. Bits AI will locate the offending log line in your code and propose a fix that removes the sensitive field at the source, so you can remediate the leak regardless of how the log is processed downstream. From there, you can review the change in Datadog and open a pull request to commit the fix in a few clicks. Contact your Datadog account team to request access.

Detect and resolve sensitive data leaks with the new SDS Findings Explorer

Sensitive Data Scanner helps teams detect and resolve leaks of sensitive data, such as PII, secrets/credentials, and financial information, in their telemetry data in order to meet security and compliance requirements. The new SDS Findings Explorer now groups every match by specific data patterns so you can precisely see where leaks originate, how often they occur, and which services are responsible. Each finding includes a seven-day trend chart, sample log events with sensitive content highlighted, rule and ownership context, and recommended remediations to resolve the leak at the source. The Findings Explorer is available in preview for logs, with RUM and APM support to follow later in 2026. Contact your Datadog account team to request access.

Modernize authentication and meet compliance demands with Governance

Introducing Datadog’s new API authentication model

Modern infrastructure depends on automated systems, AI agents, and cloud-native workloads that need secure, auditable access to APIs. Datadog’s new API authentication model modernizes how teams authenticate to Datadog APIs by introducing four purpose-built credential types: Personal Access Tokens (PATs), Service Access Tokens (SATs), Workload Identity Federation, and customer-managed OAuth clients. These new authentication methods provide scoped, identity-aware access for developers, CI/CD pipelines, autonomous AI agents, and cloud provider workloads without relying on long-lived shared credentials. PATs, SATs, and Workload Identity Federation for AWS workloads are generally available. OAuth client support is planned for release later this year. Application keys will continue to work after Q3 2026, but they will be considered legacy features and no new capabilities will be added. Learn more about which scoped credential is best for your use case in our dedicated blog.

Monitor and secure high-impact workloads with FedRAMP® High-certified observability

Datadog for Government has achieved FedRAMP® High certification, enabling federal agencies and regulated organizations to secure their most sensitive mission-critical workloads within the US1-FED GovCloud environment. As FedRAMP’s most stringent security baseline, High certification supports organizations with rigorous requirements for confidentiality, integrity, availability, and continuous monitoring, while giving teams the flexibility to scale as their needs evolve. With this milestone, public-sector teams, as well as organizations in industries such as healthcare and financial services, can use Datadog’s unified observability and security platform to monitor, troubleshoot, and secure sensitive workloads, without introducing separate tools or workflows. To learn more, read our blog post on Datadog for Government achieving FedRAMP High certification.

Connect Azure to Datadog with Secretless authentication

Telemetry access from your Azure environments has traditionally required a client secret, which forces periodic rotation to keep telemetry flowing. With Secretless authentication, you can connect your subscriptions using workload identity federation instead, removing the need to rotate credentials before expiration. To get started, follow the Secretless authentication setup guide in the Azure integration docs.
Original source
Jun 24, 2026
Date parsed from source:
Jun 24, 2026

First seen by Releasebot:
Jun 26, 2026
Datadog

Automatically enrich security logs with MITRE ATT&CK context before they reach your SIEM

Datadog adds MITRE ATT&CK Enrichment Packs to Observability Pipelines, automatically tagging security logs with ATT&CK tactics and techniques across Okta, Palo Alto, FortiGate, AWS WAF, CloudTrail, and Windows for faster investigations and detections.
To detect and investigate threats, security teams need to collect telemetry data from identity providers, cloud platforms, web application firewalls, and endpoints. But these diverse sources describe the same tactics, techniques, and procedures (TTPs) differently according to their own vendor-specific language. For example, a failed Windows logon appears as an event ID, while an Okta account lockout appears as an identity event. A firewall, meanwhile, may represent a similar attack through a completely different log format. Because of this incompatibility, analysts often spend valuable time translating vendor-specific events into a common security context before they can investigate or respond. This step adds overhead and slows security investigations.

Observability Pipelines addresses this challenge with MITRE ATT&CK Enrichment Packs: preconfigured mappings that automatically tag security events with the relevant ATT&CK tactics and techniques. MITRE ATT&CK Enrichment Packs enrich your logs as they move through your pipeline, before they reach your SIEM, data lake, or archive. That means ATT&CK context is already there when the log lands, ready for detections, dashboards, investigations, and reporting.

In this post, we’ll explore how these packs help teams:

Automatically enrich logs with MITRE ATT&CK context

Investigate security incidents across every source

Automatically enrich logs with MITRE ATT&CK context

MITRE ATT&CK gives security teams a shared framework for describing how attackers operate, from initial access through exfiltration. For telemetry pipelines, that context can help teams enrich logs and events with relevant tactics and techniques before the data reaches downstream security tools, making it easier to prioritize the signals that matter for detection, threat hunting, and incident response.

Observability Pipelines now brings this context directly into your logs through MITRE ATT&CK Enrichment Packs. The initial release includes packs for:

Okta (identity and access): Tags authentication events that signal account abuse, including logins, MFA tampering, MFA fatigue, impersonation, API token creation, and account lockouts

Palo Alto (network and perimeter): Tags firewall activity including exploit attempts, command-and-control traffic, malware transfers, denial-of-service, VPN access, and admin brute force

FortiGate (network and perimeter): Tags the same firewall behaviors as Palo Alto and also maps data-loss events to exfiltration techniques

AWS WAF (web): Tags web-layer attacks including exploit attempts, brute force, bot activity, anonymous proxy traffic, and credential stuffing

CloudTrail (cloud): Tags cloud control-plane activity including console logins, IAM changes, defense evasion, and cloud infrastructure reconnaissance

Windows (endpoint): Tags endpoint-only behaviors like PowerShell execution, scheduled task creation, service installation, and event log clearing

Teams can browse and add packs directly from Observability Pipelines. Each pack comes preconfigured with mapping logic, so there’s nothing to build from scratch.

For example, suppose you’re a security engineer using Okta logs in Splunk to detect identity attacks. On its own, an event such as user.session.impersonation.grant means something only to analysts who already know Okta’s event taxonomy. After you add the Okta Pack, that same event arrives tagged as the MITRE ATT&CK tactic “Privilege Escalation” and the technique “Use Alternate Authentication Material (T1550).” Detection rules can then target MITRE ATT&CK fields rather than vendor-specific event names.

Once the pack is added, you can validate the enrichment logic against production log samples by using Live Capture. Live Capture shows exactly how an event is transformed as it moves through the pipeline. For example, the raw Okta event below enters on the left and exits on the right with its MITRE tags added, along with a security:true flag and a source field.

Investigate security incidents across every source

Security teams often spend as much time normalizing data as investigating threats. When the same attacker behavior appears in different formats across systems, teams can end up maintaining separate detection logic for each source.

MITRE ATT&CK Enrichment Packs apply one enrichment model across all supported sources. Identity events, network activity, cloud control-plane changes, and endpoint behaviors all arrive with the same fields, so teams write detections once and build dashboards on the same fields across every source. Inside each pack, processors match specific events and apply the corresponding MITRE ATT&CK mappings automatically, so analysts receive events already labeled with attacker intent. Each enriched event also carries a flag marking it as security-relevant.

That standardization is what makes investigations fast. Say you’re investigating a possible account takeover. Without MITRE tags, you’d query each source in its own syntax and manually stitch together the timeline. With the tags applied in the pipeline, you can filter on @mitre.tactic:Privilege Escalation and see those events side by side, already labeled, from the moment they arrive. From there, you can narrow down to a specific technique or widen to @security:true to surface every security-relevant event at once.

Because the tagging happens in the pipeline, before logs leave your environment, every destination gets the same context automatically. You can route the tagged events to the SIEM or data lake of your choice, including Splunk, Microsoft Sentinel, Datadog Cloud SIEM, or a data lake like Databricks or ClickHouse. The MITRE fields are already there when the log lands, and detection rules fire right away, without anything needing to be looked up.

Start enriching your security logs today

By tagging logs with MITRE ATT&CK context, you normalize security logs automatically, investigate activity across sources with a shared taxonomy, and speed up detections.

MITRE ATT&CK Enrichment Packs are included with Observability Pipelines at no additional cost and are available in all regions and environments outside GovCloud. To get started, open the Packs gallery in Observability Pipelines and add the pack that matches your log source. To learn more, check out the Observability Pipelines documentation, or read our blog on how Observability Pipelines can enrich logs with additional context on-stream. And if you’re not yet a Datadog customer, sign up for a 14-day free trial.
Original source
Jun 22, 2026
Date parsed from source:
Jun 22, 2026

First seen by Releasebot:
Jun 26, 2026
Datadog

Using Evaluation Frameworks with Agent Observability

Datadog adds native support for DeepEval and Pydantic Evals in Agent Observability, letting teams run, compare, and monitor LLM evaluations in Datadog Experiments with trace-linked results, regression visibility, and continuous production traffic scoring.
AI teams have invested heavily in evaluation frameworks, yet getting those frameworks beyond local experimentation remains challenging.

Teams using open source libraries like DeepEval and Pydantic Evals gain flexibility and research-grounded metrics, but operationalizing those evaluations still requires brittle custom integration code that doesn’t scale. SaaS eval platforms often prioritize convenience, which can come at the cost of flexibility when teams need to port or extend their metric definitions over time. The result is that even mature teams with carefully tuned, task-specific evaluators end up with siloed artifacts: evals that work in a notebook, break in CI, and vanish entirely in production monitoring.

In this post, we explain how Datadog Agent Observability addresses this gap by letting teams run their existing DeepEval evaluations natively within Datadog Agent Observability Experiments. Datadog also supports Pydantic Evals, a code-first evaluation framework that provides its own dataset, evaluator, and LLM-as-a-judge primitives, for teams that prefer it or already use it alongside Pydantic AI. The examples in this post use DeepEval, but the same patterns also apply to Pydantic Evals. Together, these integrations give teams a single place to define, run, and monitor evaluation quality across every stage of development and deployment.

We’ll cover:

Why framework portability matters for LLM evals

How to set up experiments with Datadog Agent Observability

How to connect eval scores to production traces

How to run LLM evaluations continuously on production traffic

Why framework portability matters for LLM evals

Evaluations are an engineering asset, not a platform feature. A team that has built a suite of DeepEval evaluations has accumulated organizational knowledge about what “good” looks like for their application. That knowledge is encoded in the rubrics, thresholds, and human validation behind every G-Eval judge, RAG faithfulness metric, and custom evaluator in the suite. Rewriting those evaluators to conform to a platform’s proprietary metric definitions means discarding that investment rather than simply porting it.

Datadog Agent Observability doesn’t replace the open source eval ecosystem but wraps around it. You define what to measure and how to measure it, using the frameworks you already trust. The platform handles operationalization. It runs those evaluations at scale across hundreds or thousands of examples and tracks results over time to surface regressions. It also monitors token usage and cost across runs, and connects offline eval scores to production traces so you can verify that improvements in your Experiments environment actually translate to better user experiences. The open source scaffolding stays intact. The platform provides infrastructure for continuous eval runs, trace-linked regression visibility, and verification that offline improvements hold in production.

Set up experiments with Datadog Agent Observability

Before running experiments, enable Agent Observability in your Datadog account and install the required libraries. The example below uses ddtrace 4.8 or later and works with any version of DeepEval:

pip install ddtrace deepeval pydantic

Then enable Agent Observability instrumentation in your application:

from ddtrace.llmobs import LLMObs LLMObs.enable( ml_app = "your-llm-app", api_key = "<YOUR_DD_API_KEY>", app_key = "<YOUR_DD_APP_KEY>", site = "<YOUR_DD_SITE>", )

Step 1: Define your dataset

A dataset is a collection of inputs and expected outputs. The inputs are passed directly to your task function, whether that is a RAG pipeline, an agent, or any other LLM application, which produces an actual output. The experiment then compares that actual output against the expected output you provide to score each example. All you need to define a dataset are a name, a version, and a list of those input and expected output pairs.

from ddtrace.llmobs import LLMObs dataset = LLMObs.create_dataset( dataset_name = "rag-customer-support-v1", description = "Example dataset containing customer support examples", records =[ { "input_data" : { "question" : "How do I reset my password?" }, "expected_output" : { "answer" : "Click 'Forgot Password' on the login page..." }, "metadata" : { "difficulty" : "easy" } }, { "input_data" : { "question" : "What's your refund policy?" }, "expected_output" : { "answer" : "We offer 30-day refunds for..." }, "metadata" : { "difficulty" : "easy" } }, ], )

Step 2: Configure your DeepEval or Pydantic evaluator

Existing DeepEval metrics like G-Eval judges, RAG faithfulness metrics, and custom LLM-as-a-judge implementations can be used without modification.

from deepeval.metrics import GEval from deepeval.test_case import LLMTestCaseParams helpfulness_evaluator = GEval( name = "Helpfulness", criteria = "Determine whether the response directly answers the user's question with actionable steps.", evaluation_steps =[ "Check whether the content of the 'actual output' contradict the content of the 'expected output'", "You should also heavily penalize omission of detail", "Vague language, or contradicting OPINIONS, are not OK", "The user's question should be answered by the 'actual output'" ], evaluation_params =[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT], async_mode = True, )

Setting async_mode=True runs evaluations concurrently across the dataset. For a dataset of 100 examples, this can significantly reduce the time evaluations take to run.

Step 3: Define your task and run the experiment

The task function takes an input from your dataset and returns an output, which is typically a call to your LLM application or RAG pipeline.

from ddtrace.llmobs import LLMObs def my_rag_task(input_data): question = input_data["question"] response = your_rag_pipeline(question) return {"answer": response} experiment = LLMObs.Experiment( name = "rag-customer-support-baseline", dataset = dataset, task = my_rag_task, evaluators =[helpfulness_evaluator] ) experiment.run()

When experiment.run() is called, Datadog executes the task function across every example in the dataset, runs the DeepEval metrics in parallel, and uploads results to the Experiments UI for analysis.

Analyze experiment results in Datadog

Once an experiment completes, Datadog makes results available in the Datadog Agent Observability Experiments UI. You can select any prior experiment run as a baseline and view side-by-side comparisons of eval scores, latency, token usage, and cost. If switching to a different model improved helpfulness scores by 12% but introduced a 3× latency increase, the same view shows both changes without cross-referencing separate tools.

For any low-scoring example, you can drill into the full trace to see the exact prompt sent to the model, the completion, the eval score, and evaluator reasoning. This visibility reduces the need to reproduce failures locally or reconstruct context from logs after the fact.

Connect eval scores to production traces

Eval scores in isolation have a practical ceiling. A helpfulness score that drops from 0.82 to 0.74 between runs raises questions about what caused the drop. Answering it requires knowing which examples regressed, what changed in the prompt or model output, and whether the issue originated in retrieval or generation. It also requires understanding how the regression correlates with latency or token usage.

Without observability, this means manually correlating data from an eval framework and a separate logging system. Engineers have to copy trace IDs, cross-reference timestamps, and piece together context that should already be connected.

Running DeepEval metrics with Datadog automatically links every eval score to the trace, prompt, and token count that produced it. Regressions are clickable, explorable, and reproducible within the same Datadog platform used to monitor the rest of your application.

Run LLM evals continuously on production traffic

Most teams treat evals as a pre-deployment gate where a batch job in CI that produces a pass or fail decision before a change ships. These evals can catch regressions before they reach users, but they do not surface issues that emerge in production as traffic patterns, user inputs, or upstream dependencies change over time.

With Datadog, evals can run continuously on sampled production traffic alongside offline experiment workflows. The same evaluators used during development can score live completions, and the results feed into the same dashboards and alerting infrastructure used for the rest of the application stack. Teams can catch quality regressions as they happen rather than learning about them from user feedback.

Get started with Datadog Agent Observability

Datadog Agent Observability lets teams run DeepEval and Pydantic Evals evaluations natively within Datadog Experiments without needing to rewrite existing evaluators or adopting proprietary metric definitions. By connecting offline eval scores to production traces, teams can catch quality regressions at every stage of development and deployment, not just at the pre-deployment gate. As LLM applications grow more complex, continuous evaluation against live traffic becomes as essential as any other part of the observability stack. To learn more, check out the Agent Observability documentation.

If you don’t have a Datadog account, you can sign up for a 14-day free trial to get started with Agent Observability.
Original source
Jun 10, 2026
Date parsed from source:
Jun 10, 2026

First seen by Releasebot:
Jun 26, 2026
Datadog

Store and search high-volume logs with ClickHouse and Datadog

Datadog adds a Preview integration with ClickHouse that helps teams route high-volume logs to ClickHouse through Observability Pipelines and search them from the Datadog Log Explorer without re-ingesting data, giving users a cost-effective way to keep logs searchable at scale.
As teams scale AI and agentic workloads, log volumes can grow fast. That growth can force teams into a difficult trade-off: Keep logs searchable in their existing workflows, or store them cost-effectively for longer periods. For teams that rely on logs during incident response, compliance reviews, and long-running investigations, losing either affordability or searchability can slow down troubleshooting.

Datadog and ClickHouse are partnering to help remove that trade-off. Two new capabilities, now in Preview, let you route high-volume logs to ClickHouse through Datadog Observability Pipelines and search those logs directly from the Datadog Log Explorer without re-ingesting them into Datadog.

First, we’ll explain what ClickHouse is and why it is useful for high-volume observability data. Then we’ll describe how the integration enables you to:

Route logs to ClickHouse with Observability Pipelines

Search ClickHouse logs from the Log Explorer

What is ClickHouse?

ClickHouse is a high-performance, open source columnar database originally built for real-time analytics at massive scale. For observability use cases, ClickHouse supports sub-second analysis across petabytes of logs, metrics, traces, and events. It also helps reduce storage costs through high compression and separation of storage and compute.

ClickHouse is well suited for high-cardinality telemetry data, which makes it a strong fit for organizations managing large and fast-growing observability datasets. Organizations including OpenAI, DoorDash, Anthropic, and Shopify use ClickHouse as an observability database for large-scale analytics.

Route logs to ClickHouse with Observability Pipelines

With a native ClickHouse destination for Datadog Observability Pipelines, you can send application and infrastructure logs to ClickHouse with in-stream parsing, enrichment, and redaction. This helps high-volume data land in a cost-efficient store already shaped for querying.

You can decide which logs should go where from a single pipeline. For example, you might route high-value logs to Datadog for real-time monitoring and send high-volume logs, or logs that require longer retention for compliance reasons, to ClickHouse. Observability Pipelines gives you a vendor-agnostic way to control these routing decisions before logs reach their destinations.

Search ClickHouse logs from the Log Explorer

With Federated Logs, you can query logs stored in ClickHouse directly from the Datadog Log Explorer without re-ingesting the data into Datadog. This gives teams one search experience across Datadog-managed and ClickHouse-stored logs.

This approach helps teams retain higher volumes of logs for longer periods without sampling, while still investigating them alongside the rest of their observability data. Because the data stays in ClickHouse, teams can query it in place instead of duplicating logs across systems. And during an incident, engineers can move between Datadog and ClickHouse data without switching tools or rebuilding queries in a separate UI.

Get started with Datadog and ClickHouse

The native ClickHouse destination for Observability Pipelines and federated search for ClickHouse logs are now available in Preview. Together, these capabilities help teams store high-volume logs cost-effectively in ClickHouse while keeping those logs searchable from the Datadog Log Explorer.

To get started, read the ClickHouse integration documentation or request access to the Federated Logs Preview. To learn more about routing telemetry data with Observability Pipelines, read the Observability Pipelines documentation. If you’re new to Datadog, you can sign up for a 14-day free trial.
Original source
Jun 9, 2026
Date parsed from source:
Jun 9, 2026

First seen by Releasebot:
Jun 26, 2026
Datadog

Autonomously monitor for impactful degradations with Bits Detection

Datadog introduces Bits Detection, a preview feature that automatically creates, tunes, and maintains monitors for services. It uses telemetry, ownership, deployment history, and source code to adapt coverage, catch issues on critical paths faster, and connect detection to investigation and remediation.
Monitoring is built around the system a team understands at a point in time. Engineers add endpoints, move dependencies, and change user flows every day. Over time, that creates coverage drift as monitors keep reflecting the system as it used to behave, while changing paths introduce failure modes that teams didn’t yet know to watch for.

Bits Detection automatically creates, tunes, and maintains monitors for your services. It draws on the telemetry, ownership, and deployment history Datadog already tracks, and on your source code when connected. From there, it identifies what needs coverage, sets detection rules for each endpoint, and adjusts them as systems change.

In this post, you’ll learn how Bits Detection helps you:

Identify which service paths need detection

Set detection logic from production behavior

Update coverage as services change

Connect detections to investigation and remediation

Identify which service paths need detection

Not every degradation has the same level of risk. A slow internal endpoint might be tolerable for several minutes, while a failing checkout, signup, or authentication path can affect customers almost immediately. Aggregate service health metrics do not make that distinction. A service may appear fine at the top level while one critical path is failing.

Bits Detection uses the context already in Datadog to determine which parts of a service require detection coverage. It draws on service behavior, historical telemetry, dependency topology, team ownership metadata, recent deployments, and user impact signals to focus coverage on the endpoints and dependencies where a degradation is most likely to affect customers.

Endpoint-level monitoring can reveal degradations that service-level averages might hide. But maintaining that coverage manually doesn’t scale. A single service can expose dozens or hundreds of endpoints, each with different traffic patterns, customer impact, and failure modes. Those endpoints also change as engineers add routes, modify user flows, or shift traffic through small frontend or API changes. Low-traffic endpoints can make static thresholds harder to tune because normal behavior may be inconsistent. Bits Detection helps address this challenge by determining which endpoints need coverage, setting detection logic from observed behavior, and keeping coverage current.

Set detection logic from production behavior

Knowing which endpoints to watch is only part of the problem. Unlike traditional monitoring, which relies on static thresholds, Bits Detection determines what unhealthy behavior looks like for each service and endpoint by evaluating changes against observed production behavior and actual customer impact.

Production metrics are not static. Normal baseline behavior changes as services evolve. Bits Detection accounts for that when evaluating whether a change is worth alerting on, rather than firing on metrics that move outside a static preset range.

You can shape that logic over time through feedback. When you tell Bits Detection which alerts are useful and which ones create noise, you tune detection for your environment. Providing this information keeps operational judgment within your team while reducing the manual work of tuning and maintaining every rule.

Update coverage as services change

Without someone actively managing it, traditional monitoring coverage can drift away from production. That gap is rarely obvious until something breaks. AI-assisted development accelerates this challenge. As teams write and ship code faster, they introduce more change into production than manual monitoring processes were designed to keep up with.

Bits Detection treats monitoring as an ongoing process rather than a one-time setup. As services evolve, coverage and alerting logic update to match without having your team manually revisit every threshold, routing rule, and endpoint.

Your team’s existing monitors can stay in place. The coverage your team has built for known failure modes, service level objectives, and compliance requirements continues to work as configured. Bits Detection works alongside them, adding adaptive coverage for the parts of your system that change too quickly to model by hand.

Connect detections to investigation and remediation

Detection is where the response process starts. After an issue surfaces, you still need to find out what changed, understand the blast radius, identify the likely cause, decide what to do, and confirm the service recovered. For many organizations, each step requires pulling information from different tools.

When Bits Detection flags an issue, it points to the affected endpoint and the related telemetry, so you know where to start.

This is the first step in the Bits workflow that moves from detection to investigation to remediation. Bits Detection reduces mean time to detection (MTTD) by identifying issues earlier. Autonomous investigation begins triage, while autonomous remediation helps you move from likely cause to recovery by recommending or taking action within defined guardrails.

Getting started with Bits Detection

Bits Detection keeps monitoring aligned with production by automatically identifying which endpoints and dependencies need coverage, determining what healthy behavior looks like for each, and updating that coverage as services change. It enables you to spend less time writing and tuning monitors, and helps you catch issues on critical paths before they show up in aggregate service-level health.

To start using Bits Detection, sign up for the Preview today.

If you’re not already a Datadog customer, start a 14-day free Datadog trial.
Original source
Jun 9, 2026
Date parsed from source:
Jun 9, 2026

First seen by Releasebot:
Jun 26, 2026
Datadog

Securing the AI era: Outpace AI-powered attacks with unified security and observability

Datadog introduces Runtime Prioritization Engine, AI Guard, and Bits Security Analyst to help teams cut vulnerability noise, secure AI agents, and detect threats faster with real-time context, automated triage, and tighter incident response.

Prevent exposure in dynamic environments

Security teams are dealing with a fundamentally different operating environment than they were a few years ago. AI-assisted development is rapidly pushing more code and infrastructure into production, and according to Datadog’s 2026 State of DevSecOps report, 40% of running services have an exploitable vulnerability. Meanwhile, AI is giving attackers new ways to automate reconnaissance and accelerate exploits, which has collapsed the window between vulnerability disclosure and active exploitation from months to minutes.

Attacks are only going to increase in volume and speed. But defenders are still overwhelmed by alerts and disconnected tools, manually reconstructing context during active incidents, and chasing ownership across services. The teams that stay ahead won’t close that gap with more tools or signals—what separates useful data from noise is knowing what is actually happening in your production environments. When you have a clear picture of your attack surface, you can identify what’s truly exploitable and focus on the threats that can actually harm your business. And when a threat surfaces, you can detect, investigate, and contain it faster, with a unified system that makes ownership clear and keeps pace with AI.

We built the Datadog platform to give security and engineering teams the real-time visibility and context they need to prevent exposure, detect and investigate threats faster, and contain attacks before they cause harm.

Periodic scans and manual remediation efforts aren’t enough when new services, APIs, and AI agents continuously change your environment’s risk profile. And with vulnerabilities accumulating faster than teams can triage them, knowing which ones matter is the difference between staying ahead of attackers and falling behind. Datadog researchers found that when you apply runtime context, 97% of critical vulnerabilities can be deprioritized because you can see which ones are actually reachable in production.

Today at DASH, we announced the Datadog Runtime Prioritization Engine to give teams real-time visibility into runtime behavior and execution context. The engine uses live APM traces, logs, and observed service dependencies to assess whether a vulnerability is truly business critical. By understanding whether an affected resource is actively running, connected to a crown-jewel application, and exposed in production, teams can focus remediation efforts on the risks that matter most. The engine also identifies the resource owner based on live production data, helping organizations accelerate response and reduce time to remediation.

This visibility means your team can filter out findings that pose no real risk and focus on what’s both exploitable and relevant to business-critical systems, instead of focusing on static severity scores. In Preview, the engine has already helped customers reduce vulnerability noise by 95%.

Bits Code and Datadog’s AI-native Static Code Analysis (SAST) capabilities embed that deep visibility into your day-to-day development workflows. AI-native SAST reviews findings in context to assess whether they’re likely true or false positives before they reach your triage queue. To resolve a vulnerability, Bits Code generates code fixes directly, either one at a time or through bulk remediation campaigns. Security findings also surface in the IDE and in pull requests with the details that developers need in order to act, and issues automatically route to the right owners.

Improve visibility into AI agent behavior

As more of our customers deploy AI, we’ve seen firsthand why combining security with observability is so critical for managing AI workloads. Securing AI agents specifically requires full visibility into runtime activity. To carry out their functions, AI agents often access sensitive data, read untrusted content, and communicate with data externally through HTTP calls, file writes, and commands. When those three behaviors converge, it creates conditions that make your agent a prime target for prompt injection, tool misuse, and data exfiltration.

Datadog AI Guard addresses these risks directly by providing runtime protection for agents. It evaluates the real time agent behavior, infrastructure signals, and data flows to discover unprotected agents, flag unsafe actions, and help block attacks in real time. It also maps unprotected agents across your environment with full lineage, surfacing model endpoints, data sources, and infrastructure dependencies that each agent accesses. That context is what helps teams adopt agentic workflows quickly, without security teams sacrificing oversight.

Those same protections extend to your development environment. AI Guard is expanding its capabilities to help improve the security of coding agents against malicious skills, scripts, configurations, and packages. AI Guard sits inline with the agent to block indirect prompt injections, backdoor attacks, and other OWASP Top 10 threats, so your teams can use coding agents while protecting development pipelines.

Detect, investigate, and contain threats with agentic defenses

Preventing exposure reduces your attack surface, but you also need to detect and respond when threats get through. Datadog provides AI-assisted threat detection and automated triage to help your team maintain a clear picture of the entire system as incidents unfold.

Bits Security Analyst enhances detection and investigation with deep behavioral insight and real-time context. It correlates signals across infrastructure, apps, identity, and network activity to surface threats the moment they emerge. Instead of your security analysts manually stitching together signals after the fact, Bit Security Analyst gives them a complete, continuously updated picture of an attack as it develops. Bits Security Analyst is now generally available as part of Datadog Cloud SIEM, and as of DASH, we’re expanding availability so you can deploy it on third-party SIEMs without disrupting existing workflows.

For proactive defense, Bits Threat Hunting —now available in Preview—lets your team surface anomalous behavior and uncover attack patterns before they escalate. When a threat is identified, Datadog isolates only the affected service, workload, identity, or agent and contains the blast radius without disrupting broader systems. Bits Threat Hunting is also linked to case management and incident response workflows, so your team can quickly move from detection to remediation, with every step captured from initial signal to closed case.

Security that’s built for the way teams want to operate

The gap between how fast threats move and how fast teams can respond has been the defining problem in security for the last several years. That gap exists because security tooling has historically been siloed from the rest of the observability stack, forcing teams to manually correlate context across systems that have no visibility into each other.

Datadog Security is built on an observability platform that ingests and correlates telemetry data across your tech stack, including infrastructure, applications, logs, identities, APIs, AI workloads, and more. Instead of aggregating signals from disparate systems, Datadog preserves the relationships between entities and maps them to behavior for a continuously updated view of your environment. For the teams that own the systems, that means fewer handoffs, less manual triage, and a security posture that improves as your environment evolves. For teams building on AI, Datadog provides the visibility and guardrails needed to adopt agentic workflows safely. The same unified data layer that powers security detection also gives agents production-grade context, enabling observable and secure agentic operations from development to the SOC.

See what the Datadog security platform can do for your team, and sign up for a free 14-day trial if you’re new to Datadog.
Original source
Jun 9, 2026
Date parsed from source:
Jun 9, 2026

First seen by Releasebot:
Jun 26, 2026
Datadog

Infinite Cardinality Metrics: Custom metrics built for modern systems

Datadog introduces Infinite Cardinality Metrics, generally available for capturing, exploring, and scaling highly dimensional custom metrics. It lets teams keep every useful dimension, align cost with data volume, and query rich telemetry with human and AI agents.

Freedom to capture every dimension

Every technology shift adds new context you need to measure. Cloud computing added regions and services. Kubernetes added containers and pods. Multi-tenant applications added users and tenants. AI systems add models, prompts, agents, and execution paths.

The result is that metrics are becoming dramatically more dimensional, faster than ever before. Over time, engineers are forced to make tradeoffs. They remove dimensions, sample data, or avoid instrumenting workflows altogether, not because the data isn’t valuable, but because the cost of capturing it becomes difficult to predict.

Today, we’re introducing Infinite Cardinality Metrics, a new way to capture, explore, and scale custom metrics built for modern workloads. It gives teams the freedom to capture every dimension that matters, aligns cost with data volume rather than cardinality, and enables agentic exploration of richly contextual data. Infinite Cardinality Metrics is built on three simple principles:

1. Freedom to capture every dimension

With Infinite Cardinality Metrics, teams can capture every attribute and dimension that matters without constantly evaluating the cost impact of each new tag. A metric such as request latency is counted once, regardless of whether it’s tagged by service, region, user, tenant, or device, giving teams the freedom to add the dimensions they actually need.

At Clay, an AI-powered go-to-market infrastructure platform, that freedom translated directly into how teams instrument their product.

In one of the new products we are building, the team decided to instrument it so we can slice fully by customer, execution path, and LLM call. This would have been far too cost-prohibitive previously. But under Infinite Cardinality Metrics, our infrastructure team was able to support this decision. As a result, the team now has clear, real-time aggregate monitoring in Datadog that previously would have required a data warehouse query or manual log-digging, enabling us to focus on building a great product for our customers.

— Willie Yao, Head of Engineering at Clay

Instead of deciding what context to remove, engineers can focus on capturing the data that helps them understand their systems. A metric is now priced by its metric name, not by the number of unique time series created by tag combinations.

2. Scale with data volume, not cardinality

Systems are becoming more dynamic and dimensions are multiplying, making comprehensive visibility increasingly important as organizations scale. Modern systems scale through traffic, requests, usage, and workload growth, not cardinality alone. Infinite Cardinality Metrics aligns cost with those same drivers, helping teams continue adding context without worrying about sudden cost increases from cardinality.

For teams like Figma, a collaborative design and product development platform, this creates a much more intuitive relationship between system growth and observability costs.

As a team that owns metrics at Figma, we no longer have to reason about cardinality when thinking about cost. Instead, cost scales with the same drivers as our systems—like requests and traffic—which is an intuition every engineer already understands.

— Yannis Spiliopoulos, Tech Lead of Observability at Figma

The result is a different approach to observability. Instead of asking, “Can we afford to measure this?” teams can focus on capturing the data that helps them understand and operate their systems.

3. Built for agentic querying and exploration

Capturing more dimensions is only valuable if you can actually use them. Infinite Cardinality Metrics is built for agentic querying and exploration, enabling engineers—and increasingly, AI agents—to ask questions across highly dimensional datasets without first deciding which context to discard.

For Modal, an AI infrastructure provider that serves inference, training, and sandbox workloads across tens of thousands of compute nodes, this means they can instrument metrics with worker identifiers and user context that would previously have been difficult to justify. The result is richer visibility and faster debugging at the level of detail modern workloads require.

When teams preserve more context in their metrics, they create a stronger foundation not only for human investigation, but also for AI-assisted analysis and exploration.

Metrics built for modern workloads

Infinite Cardinality Metrics gives teams the freedom to capture every dimension that matters, the ability to explore richly contextual telemetry with both humans and AI agents, and a pricing model that aligns with how modern systems actually scale.

By removing cardinality as a constraint, teams can instrument more freely, preserve valuable context, and gain deeper visibility into increasingly complex environments.

Infinite Cardinality Metrics is now generally available. To learn more, visit our documentation. If you’re new to Datadog, sign up for a 14-day free trial.
Original source

This is the end. You've seen all the release notes in this feed!

Curated by the Releasebot team

Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.

Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.

About us Our methodology

Datadog Release Notes

DASH 2026: Guide to Datadog’s newest announcements

Close the ops loop from detection to remediation

Autonomously monitor for impactful degradations with Bits Detection

Retain your team’s operational knowledge with Bits Memories

Automatically resolve issues with Bits Remediation

Detect and remediate issues before they escalate with Bits Infrastructure Operations

Ensure reliability

Move from passive observability to proactive network device health and remediation

Trace config changes causing complex network issues with Network Configuration Management

Trace network issues from application to device with L7 to L1 visibility

Diagnose internet underlay issues with BGP Centric View

Automatically optimize database queries with Datadog Database Monitoring

Query logs across storage destinations with Federated Logs

Store and search logs at petabyte scale in your own infrastructure with Datadog BYOC Logs

Ensure intent

Monitor critical user journeys with Datadog Journey Monitoring

Close the dev loop from finding to fix

Turn Datadog findings into automated code fixes with Bits Code

Ship code safely at AI speed with Bits Release

Automate synthetic test coverage with Bits Testing

The agentic stack data foundation

Get quality answers to business questions with Bits Data Analysis

Use custom metrics for the modern age with Infinite Cardinality Metrics

Build and monitor the agentic stack

Monitor agent adoption with Datadog Agent Console

Understand production LLM behavior with Patterns in Agent Observability

Improve AI agent quality with Bits Evals

Secure the agentic stack

Protect agentic AI applications with Datadog AI Guard

Cut vulnerability noise by over 95% with the Datadog Runtime Prioritization Engine

DASH 2026 Harnessing AI: Guide to Datadog’s newest announcements

AI is reimagining how engineering teams write code, investigate issues, and operate their systems.

Search, analyze, and take action across Datadog faster with Bits Chat

Talk to Bits AI by voice in the Datadog mobile app

Build dashboards via natural-language prompts with Bits Chat

Create and update investigation notebooks via prompt with Bits Chat

Use natural language to write sophisticated queries with Bits Chat in DDSQL Editor

Analyze slow or failed traces with Bits Chat

Investigate service latency with Bits Chat

Investigate cost spikes and budget overages in minutes with the Cloud Cost skill in Bits Chat

Investigate and resolve issues with Bits AI

Diagnose frontend issues faster with RUM Agentic Investigations

Get actionable performance insights via profiling and Bits AI

Schedule recurring prompts and fixes with Bits Code Automations

Triage synthetic test failures faster with Bits Investigation

Visualize alerts and start Bits AI investigations on a live infrastructure diagram

Bring Bits Investigation into your incident response workflow

Investigate governance findings in minutes with Bits Investigations in Governance Console

Find AI-generated meeting summaries in the unified incident timeline

Bring Datadog context into your AI workflows

Bring live Datadog telemetry into your AI agents with native integrations

Give your AI agents live Datadog access from the command line

Bring Datadog telemetry into your AI workflows with MCP Apps

Measure the impact of AI coding tools on your software delivery

Unify multi-cluster Kubernetes visibility with Datadog MCP tools

Expand APM context for AI agents with APM MCP toolset

Flexibly query your Datadog telemetry data with the DDSQL API and MCP tools

Build agentic workflows for alert response and remediation with Bits Agent Builder

Instrument your app for Datadog without leaving your development environment with Agentic Onboarding

Give AI agents and developer tools secure, auditable access to infrastructure hosts with Datadog Agent MCP

Reduce costs and improve performance with AI

Centralize your Kubernetes autoscaling deployment and management

Surface a broader range of service optimizations with AI Recommendations

Eliminate cloud storage waste faster with Datadog Storage Management and Bits Chat

Optimize Spark and Databricks jobs with AI and Datadog Jobs Monitoring

Automatically parse and normalize all your logs

Generate AI-based Grok parsing rules with one click

Build agent-assisted internal apps with Datadog Apps

Gain visibility into AI usage, performance, and spend with Datadog AI integrations

DASH 2026 End-to-End Observability: Guide to Datadog’s newest announcements

Comprehensive observability starts with quick instrumentation and full visibility into every layer of your stack

Datadog is the OpenTelemetry-native observability platform

Get OpenTelemetry-native in-app experiences powered by semantic conventions for infrastructure and APM

OpenTelemetry-native Infrastructure Monitoring

Root-cause Kubernetes issues efficiently with OpenTelemetry data

Application Performance Monitoring now natively supports OpenTelemetry RED metrics and semantics

Manage DDOT pipeline configurations at scale with Fleet Automation

Accelerate OTel gateway resolutions with Topology View in Fleet Automation

Get deeper service visibility with less setup