MLflow Release Notes
Last updated: Feb 21, 2026
- Feb 20, 2026
- Date parsed from source:Feb 20, 2026
- First seen by Releasebot:Feb 21, 2026
v3.10.0
MLflow 3.10.0 delivers major updates including multi-workspace tracking, multi-turn evaluation with new session UIs, and automatic LLM cost tracking. The release also refreshes navigation, adds a demo experiment, and enables in-UI trace scoring for quicker insights.
Major New Features:
- 🏢 Organization Support in MLflow Tracking Server: MLflow now supports multi-workspace environments. Users can organize experiments, models, prompts, with a coarser level of unit and logically isolate them in a single tracking server. (#20702, #20657, @mprahl, @Gkrumbach07, @B-Step62)
- 💬 Multi-turn Evaluation & Conversation Simulation: MLflow now supports multi-turn evaluation, including evaluating existing conversations with session-level scorers and simulating conversations to test new versions of your agent, without the toil of regenerating conversations. Use the session-level scorers introduced in MLflow 3.8.0 and the brand new session UIs to evaluate the quality of your conversational agents and enable automatic scoring to monitor quality as traces are ingested. (#20243, #20377, #20289, @smoorjani)
- 💰 Trace Cost Tracking: Gain visibility into your LLM spending! MLflow now automatically extracts model information from LLM spans and calculates costs, with a new UI that renders model and cost data directly in your trace views. (#20327, #20330, @serena-ruan)
- 🎯 Navigation bar redesign: We've redesigned the navigation to provide a frictionless experience. A new workflow type selector in the top-level navbar lets you quickly switch between GenAI and Classical ML contexts, with streamlined sidebars that reduce visual clutter. (#20158, #20160, #20161, #20699, @ispoljari, @daniellok-db)
- 🎮 MLflow Demo Experiment: New to MLflow GenAI? With one click, launch a pre-populated demo and explore tracing, evaluation, and prompt management in action. No configuration, no code required. (#19994, #19995, #20046, #20047, #20048, #20162, @BenWilson2)
- 📊 Gateway Usage Tracking: Monitor your AI Gateway endpoints with detailed usage analytics. A new usage page shows request patterns and metrics, with trace ingestion that links gateway calls back to your experiments for end-to-end observability. (#20357, #20358, #20642, @TomeHirata)
- ⚡ In-UI Trace Evaluation: Users can now run custom or pre-built LLM judges directly from the traces and sessions UI. This enables quick evaluation of individual traces and individual without context switching to the python SDK. (#20360, @hubertzub-db, @danielseong1)
Features:
- [UI] Add sliding animation to workflow switch component (#20831, @daniellok-db)
- [Tracing] Display cached tokens in trace UI (#20957, @TomeHirata)
- [Evaluation] Move select traces button to be next to Run judge (#20992, @PattaraS)
- [Gateway] Distributed tracing for gateway endpoints (#20864, @TomeHirata)
- [Gateway] Add user selector in the gateway usage page (#20944, @TomeHirata)
- [Docs] [MLflow Demo] Docs for GenAI Demo (#20240, @BenWilson2)
- [UI] Move Getting Started above experiments list and make collapsible (#20691, @B-Step62)
- [Model Registry / Tracking] Add mlflow migrate-filestore command (#20615, @harupy)
- [UI] Add visual indicator for demo experiment in experiment list (#20787, @B-Step62)
- [Scoring] Enable parquet content_type in the scoring server input for pyfunc (#20630, @TFK1410)
- [UI] feat(ui): Add workspace landing page, multi-workspace support, and qu… (#20702, @Gkrumbach07)
- [Tracking] Merge workspace feature branch into master (#20657, @B-Step62)
- [Gateway] Add Gateway Usage Page (#20642, @TomeHirata)
- [Gateway] Add usage section in endpoint page (#20357, @TomeHirata)
- [UI] [ MLflow Demo ] UI updates for MLflow Demo interfaces (#20162, @BenWilson2)
- [Build] Support comma-separated rules in # clint: disable= comments (#20651, @copilot-swe-agent)
- [Build / Docs / Models / Projects / Scoring] Replace virtualenv with python -m venv in virtualenv env_manager path (#20640, @copilot-swe-agent)
- [Tracing] Add per-decorator sampling_ratio_override parameter to @mlflow.trace (#19784, @harupy)
- [Evaluation / Tracking] Add mlflow datasets list CLI command (#20167, @alkispoly-db)
- [Gateway] Add trace ingestion for Gateway endpoints (#20358, @TomeHirata)
- [Tracing] feat(typescript-anthropic): add streaming support (#20384, @rollyjoel)
- [Evaluation] Add delete dataset records API (#19690, @joelrobin18)
- [UI] Add tooltip link to navigate to traces tab with time range filter (#20466, @serena-ruan)
- [Tracking] [MLflow Demo] Add mlflow demo cli command (#20048, @BenWilson2)
- [Evaluation] Add an SDK for distillation from conversation to goal/persona (#20289, @smoorjani)
- [Tracing] Livekit Agents Integration in MLflow (#20439, @joelrobin18)
- [Tracing / UI] Enable running scorers/judges from trace details drawer in UI (#20518, @danielseong1)
- [Gateway] link gateway and experiment (#20356, @TomeHirata)
- [Prompts] Add optimization backend APIs to auth control (#20392, @chenmoneygithub)
- [Tracing] Add an SDK for search sessions to get complete sessions (#20288, @smoorjani)
- [Tracing] Reasoning in Chat UI Mistral + Chat UI (#19636, @joelrobin18)
- [Evaluation] Add TruLens third-party scorer integration (#19492, @debu-sinha)
- [Evaluation / Tracing] Add Guardrails AI scorer integration (#20038, @debu-sinha)
- [Tracking] [MLflow Demo] Add Prompt demo data (#20047, @BenWilson2)
- [Tracking] [MLflow Demo] Add Eval simulation data (#20046, @BenWilson2)
- [Tracking] [MLflow Demo] Add trace data for demo (#19995, @BenWilson2)
- [Tracking] Support get_dataset(name=...) in OSS environments (#20423, @alkispoly-db)
- [UI] Add session comparison UI with goal/persona matching (#20377, @smoorjani)
- [UI] Model and cost rendering for spans (#20330, @serena-ruan)
- [UI] [1/x] Support span model extraction and cost calculation (#20327, @serena-ruan)
- [Evaluation] Make conversation simulator public and easily subclassable (#20243, @smoorjani)
- [Prompts] Add progress tracking for prompt optimization job (#20374, @chenmoneygithub)
- [Prompts] Prompt Optimization backend PR 3: Add Get, Search, and Delete prompt optimization job APIs (#20197, @chenmoneygithub)
- [Prompts] Track intermediate candidates and evaluation scores in gepa optimizer (#20198, @chenmoneygithub)
- [Tracking] [MLflow Demo] Base implementation for demo framework (#19994, @BenWilson2)
- [Prompts] Prompt Optimization backend PR 2: Add CreatePromptOptimizationJob and CancelPromptOptimizationJob (#20115, @chenmoneygithub)
- [Tracing] Support shift+select for Traces (#20125, @B-Step62)
- [UI] Ml61127/remove experiment type selector inside experiment page (#20161, @ispoljari)
- [UI] Ml61126/remove nested sidebars within gateway and experiments tab (#20160, @ispoljari)
- [UI] [ML-61124]: add selector for workflow type in top level navbar (#20158, @ispoljari)
- [Prompts / UI] Feat/render md in prompt registry (#19615, @iyashk)
- [Prompts] [Prompt Optimization Backend PR #1] Wrap prompt optimize in mlflow job (#20001, @chenmoneygithub)
- [Tracking] Add --experiment-name option to mlflow experiments get command (#19929, @alkispoly-db)
- Bug fixes:
- [Tracing / UI] Fix infinite fetch loop in trace detail view when num_spans metadata mismatches (#20596, @coldzero94)
- [UI] fix:implement dark mode in experiment correctly (#20974, @intelliking)
- [Evaluation] Fix 'Select traces' do not show new traces in Judge UI (#20991, @PattaraS)
- [Tracing / Tracking] Fix RecursionError in strands, semantic_kernel, and haystack autologgers with shared tracer provider (#20809, @cgrierson-smartsheet)
- [Tracking] fix(tracking): Fix IntegrityError in log_batch when duplicate metrics span multiple key batches (#20807, @aws-khatria)
- [Tracing] Support native tool calls in CrewAI 1.9.0+ autolog tests (#20742, @TomeHirata)
- [Evaluation] Fix retrieval_relevance assessments logged to wrong span with missing chunk index (#20998, @smoorjani)
- [Evaluation] Fix missing session metadata on failed session-level scorer assessments (#20988, @smoorjani)
- [Tracking] Enhance path validation in check_tarfile_security for windows (#20924, @TomeHirata)
- [Docs] Fix admonition link underlines not rendering (#20990, @copilot-swe-agent)
- [Tracking] Rebuild SearchTraces V2 request body on ENDPOINT_NOT_FOUND fallback (#20963, @brendanmaguire)
- [Build] Add model version search filtering based on user permissions (#20964, @TomeHirata)
- [Tracing] Display notebook trace viewer when workspace is on (#20947, @TomeHirata)
- [Tracing] Add MLFLOW_GATEWAY_RESOLVE_API_KEY_FROM_FILE flag to prevent local file inclusion in API gateway (#20965, @TomeHirata)
- [Tracking] Fix Claude Agent SDK tracing by capturing messages from receive_messages (#20778, @smoorjani)
- [Build / Tracking] Add missing authentication for fastapi routes (#20920, @TomeHirata)
- [Evaluation] Fix guardrails scorer compatibility with guardrails-ai 0.9.0 (#20934, @smoorjani)
- [UI] Fix duplicated title and add icons to Experiments/Prompts page headers (#20813, @B-Step62)
- [Tracing] Trace UI papercut: highlight searched text and change search box hint's wording. (#20841, @PattaraS)
- [Prompts] Fix arbitrary file read via prompt tag validation bypass in Model Registry (#20833, @TomeHirata)
- [Tracking] Fix RestException crash on null error_code and incorrect except clause (#20903, @copilot-swe-agent)
- [UI] Fix Disable action button in Traces Tab (#20883, @joelrobin18)
- [UI] Fix experiment rename modal not refreshing experiment details (#20882, @joelrobin18)
- [Build] Skip workspace header when workspace is disabled (#20904, @TomeHirata)
- [UI] Block CORS for ajax paths (#20832, @TomeHirata)
- [UI] [UI] Improve empty states across Experiments, Models, Prompts, and Gateway pages (#20044, @ridgupta26)
- [UI] UI: Improve empty states for Traces and Sessions tabs (#20034, @ridgupta26)
- [Build] Validate webhook url to fix SSRF vulnerability (#20747, @TomeHirata)
- [Scoring / Tracing] Fix TypeError in online scoring config endpoint when basic-auth is enabled (#20783, @copilot-swe-agent)
- [Tracing] Fix experiment_id type error in gateway config resolver (#20764, @copilot-swe-agent)
- [UI] Fix docs link to respect workflow type (GenAI vs ML) (#20752, @copilot-swe-agent)
- [Tracking] Fix: Do not emit pickle warning when user calls mlflow.pyfunc.log_model with loader_module param (#20727, @WeichenXu123)
- [Tracing] Change cache config to prevent search bounce (#20688, @PattaraS)
- [Evaluation] Fix multiple align() calls on MemoryAugmentedJudge (#20708, @smoorjani)
- [Evaluation] Batch embedding calls for Databricks endpoints to avoid size limit errors (#20685, @smoorjani)
- [Evaluation] Fix the UI for MemAlign-ed scorers (#20632, @smoorjani)
- [Tracing] Fix type hints lost with @mlflow.trace decorator (#20648, @veeceey)
- [Evaluation] Use JSONAdapter for best-effort structured outputs in MemAlign predictions (#20679, @smoorjani)
- [Tracking] Fix mlflow demo URL to use experiment ID instead of name (#20678, @copilot-swe-agent)
- [Tracking] Fix circular import in FileStore caused by PromptVersion import (#20677, @copilot-swe-agent)
- [Scoring / Tracing] Fix error handling for streaming request (#20610, @TomeHirata)
- [Models] Fix warning message: add space and documentation link for pickle security (#20656, @copilot-swe-agent)
- [Evaluation] Fix SHAP compatibility for shap >= 0.47 (#20623, @copilot-swe-agent)
- [Prompts] Fix the deadlock between run linking and trace linking (#20620, @TomeHirata)
- [Tracking] Fix FTP artifact path handling on Windows with Python 3.11+ (#20622, @copilot-swe-agent)
- [Evaluation] Fix failed judge call error propagation (#20601, @AveshCSingh)
- [Tracking] Fix off-by-one error in _validate_max_retries and _validate_backoff_factor (#20597, @vb-dbrks)
- [Prompts] Fix bug: linking prompt to experiments does not work for default experiments (#20588, @PattaraS)
- [Build] Fix Docker full image tags not being published for versioned releases (#20589, @copilot-swe-agent)
- [Prompts] Implement locking mechanism to prevent race conditions during prompt linking (#20586, @TomeHirata)
- [Prompts] Revert "Fix bug: linking prompt to experiments does not work for defa… (#20585, @PattaraS)
- [Prompts] Fix bug: linking prompt to experiments does not work for default experiments (#20562, @PattaraS)
- [Model Registry] Fix N+1 query issue in search_registered_models (#20493, @Karim-siala)
- [Tracking] Fix optimistic pagination in SQLAlchemy store _search_runs and handle max_results=None (#20547, @copilot-swe-agent)
- [UI] Add cancel button for LLM judge evaluations in trace details drawer (#20519, @danielseong1)
- [UI] Fix incorrect 'Trace level' label in session judges modal (#20520, @danielseong1)
- [Tracing] fix: allow overriding notebook trace iframe base URL (#20485, @TatsuyaHayashino)
- [Prompts] Include the prompt model config in the optimized prompt (#20431, @chenmoneygithub)
- [Tracing / UI] Fix Anthropic trace UI rendering for tool_result with image content (#20190, @joncarter1)
- [Tracking] Enforce authorization on AJAX proxy artifact APIs (#20035, @mprahl)
- [Tracking] Ensure server-provided artifact root is reused on MLflowClient calls (#19336, @mprahl)
- [UI] Fix trace selection not registering in SelectTracesModal (#20099, @joelrobin18)
Documentation updates:
- [Docs] Add documentation for mlflow migrate-filestore command (#20616, @harupy)
- [Docs] Document X-MLFLOW-WORKSPACE header for AI Gateway endpoints with workspace fallback behavior (#20984, @copilot-swe-agent)
- [Docs] Fix outdated server-features references to server-info (#20948, @copilot-swe-agent)
- [Docs / Tracing] Remove span attributes filtering from search traces documentation (#20858, @copilot-swe-agent)
- [Docs] Add Modal as a supported deployment target with full documentation (#20032, @debu-sinha)
- [Docs] Add gateway usage tracking doc page (#20748, @TomeHirata)
- [Docs / Evaluation] Fix MemAlign bug bash issues (#20712, @veronicalyu320)
- [Docs] Fix docs: trace spans are stored in database, not artifact storage (#20668, @B-Step62)
- [Prompts] Change header level for "Automatic Prompt Linking" section in use-prompts-in-apps.mdx (#20661, @PattaraS)
- [Docs] Multi-turn evaluation launch documentation (#20443, @smoorjani)
- [Prompts] Update use-prompts-in-apps.mdx with a section for prompt linking under traced method (#20593, @PattaraS)
- [Docs] docs: Add missing targets arg in huggingface dataset docs (#20637, @KarelZe)
- [Build] Display rule names instead of IDs in clint error output (#20592, @copilot-swe-agent)
- [Docs] Detailed guide for setting up SSO with mlflow-oidc-auth plugin (#20556, @WeichenXu123)
- [Prompts] Mark prompt registry APIs as stable. (#20507, @PattaraS)
- [Docs] code-based scorer examples (#20407, @SomtochiUmeh)
- [Docs] Custom judges section (#20393, @SomtochiUmeh)
- [Docs] (mostly) copy over eval datasets article from managed docs (#19787, @achen530)
- [Docs] Add the RAG built-in judges section (#20369, @SomtochiUmeh)
- [Docs] Fix ToolAgent name formatting in ag2 documentation and examples (#20470, @Umakanth555)
- [Docs] Add collection_name parameter to CrewAI knowledge configuration in docs and example (#20469, @Umakanth555)
- [Docs] Update index and predefined judges pages (#20368, @SomtochiUmeh)
- [Docs] docs: Clarify -full Docker image availability from v3.9.0 onwards (#20223, @copilot-swe-agent)
- [Docs] Generalize Knowledge Cutoff Note in CLAUDE.md beyond model names (#20165, @copilot-swe-agent)
Small bug fixes and documentation updates:
#20959, #20915, #20986, #20956, #20912, #20955, #20943, #20919, #20776, #20826, #20781, #20767, #20761, #20760, #20763, #20762, #20687, #20746, #20682, #20667, #20658, #20578, #20559, #20495, #20497, @TomeHirata; #21006, #20980, #20707, #20777, @bbqiu; #20950, #21008, #20877, #20822, #20817, #20813, #20816, #20796, #20815, #20765, #20716, #20689, #20744, #20690, #20451, #20502, #20252, #20314, #20210, @B-Step62; #21000, #20975, #20806, #20449, #20686, #20603, #20573, #20572, #20584, #20551, #20526, #20550, #20523, #20525, #20453, #20478, #20452, #20438, #20474, #20460, #20457, #20459, #20456, #20444, #20418, #20285, #20284, #20283, #20282, #20281, #20280, #20051, @smoorjani; #21005, #21007, #20880, #20857, #20802, #20779, #20717, #20713, #20714, #20692, #20693, #20683, #20675, #20665, #20674, #20673, #20663, #20662, #20659, #20652, #20649, #20650, #20647, #20646, #20641, #20638, #20635, #20634, #20633, #20626, #20625, #20621, #20619, #20618, #20617, #20606, #20564, #20581, #20570, #20568, #20566, #20558, #20560, #20543, #20554, #20537, #20536, #20532, #20530, #20528, #20512, #20505, #20501, #20498, #20496, #20491, #20490, #20489, #20487, #20486, #20484, #20483, #20482, #20441, #20436, #20427, #20417, #20400, #20399, #20397, #20395, #20396, #20391, #20342, #20341, #20332, #20326, #20316, #20315, #20305, #20300, #20299, #20297, #20293, #20268, #20262, #20260, #20251, #20250, #20244, #20235, #20228, #20227, #20226, #20220, #20202, #20186, #20172, #20152, #20150, #19984, #20102, #20098, #20095, #20093, #20094, #20091, #20090, #20089, #20088, #20087, #20086, #20085, #20084, #20083, #20082, #20081, #20080, #20077, #20076, #20075, #20070, #20067, #20069, #20020, #20026, @copilot-swe-agent; #20793, #20791, #20768, @WeichenXu123; #20979, #20701, #20609, #20608, #20569, #20535, #20481, #20318, #20224, #20149, #20119, #20068, #20014, #20016, #20019, @harupy; #20973, @Gkrumbach07; #21003, #20936, #20730, #20041, #20381, @xsh310; #20989, #20830, #20766, #20759, #20758, #20757, #20756, #20699, #20697, #20696, #20695, #20694, #20255, #20254, #20253, #20248, #20247, #20010, #20009, #19999, #19998, #19976, #19975, #19974, #19973, #19971, @daniellok-db; #20976, @aravind-segu; #20725, #20339, #20565, #20660, #20455, #20440, #20404, #20403, #20402, #20567, #20542, #20541, #20540, #20557, #20503, #20506, #20500, #20499, #20467, #20338, #20337, #20331, #20462, #20329, #20328, #20323, @serena-ruan; #20737, @jamesbxwu; #20862, #20861, @PattaraS; #20805, #20705, #20373, @mprahl; #20773, @etirelli; #20753, @etscript; #20629, #19758, @justinwei-db; #20711, @kevin-lyn; #20576, @nisha2003; #20553, #20521, @danielseong1; #20548, @bartosz-grabowski; #20504, @smivv; #20527, @BenWilson2; #20363, #20364, @rollyjoel; #20494, @dbczumar; #20360, #20340, #20313, #20312, #20276, #20275, #20261, #20233, #19484, @hubertzub-db; #20359, @LiberiFatali; #20386, @chenmoneygithub; #20159, @ispoljari
Original source Report a problem - Feb 12, 2026
- Date parsed from source:Feb 12, 2026
- First seen by Releasebot:Feb 18, 2026
v3.10.0rc0
MLflow 3.10.0rc0 unveils organization aware tracking, enhanced multi turn conversation simulation, cost tracking for LLMs, a GenAI vs Classical ML split, a ready to run demo experiment, and gateway usage analytics in a polished release candidate.
Major New Features
- 🏢 Organization Support in MLflow Tracking Server: MLflow now supports multi-workspace environments! You can organize your experiments and resources across different workspaces with a new landing page that lets you navigate between them seamlessly. (#20702, #20657, @mprahl, @Gkrumbach07, @B-Step62)
- 💬 Multi-turn Conversation Simulation: Building on the conversation simulator introduced in 3.9, we've made it fully public and easily subclassable. You can now create custom simulation scenarios, compare sessions with goal/persona matching, and distill conversations into reusable goal/persona pairs for comprehensive agent testing. (#20243, #20377, #20289, @smoorjani)
- 💰 Trace Cost Tracking: Gain visibility into your LLM spending! MLflow now automatically extracts model information from LLM spans and calculates costs, with a new UI that renders model and cost data directly in your trace views. (#20327, #20330, @serena-ruan)
- 🎯 Top-level GenAI/Classical ML Split: We've redesigned the navigation to provide a frictionless experience. A new workflow type selector in the top-level navbar lets you quickly switch between GenAI and Classical ML contexts, with streamlined sidebars that reduce visual clutter. (#20158, #20160, #20161, #20699, @ispoljari, @daniellok-db)
- 🎮 MLflow Demo Experiment: Get started with MLflow faster than ever! The new mlflow demo CLI command generates a fully-populated demo environment with sample traces, prompts, and evaluation data so you can explore MLflow's features hands-on without any setup. (#19994, #19995, #20046, #20047, #20048, #20162, @BenWilson2)
- 📊 Gateway Usage Tracking: Monitor your AI Gateway endpoints with detailed usage analytics. A new usage page shows request patterns and metrics, with trace ingestion that links gateway calls back to your experiments for end-to-end observability. (#20357, #20358, #20642, @TomeHirata)
Stay tuned for the full release, which will be packed with even more features and bugfixes.
To try out this release candidate, please run:
Original source Report a problempip install mlflow==3.10.0rc0 All of your release notes in one feed
Join Releasebot and get updates from MLflow and hundreds of other software products.
- Jan 30, 2026
- Date parsed from source:Jan 30, 2026
- First seen by Releasebot:Feb 18, 2026
MLflow 3.9.0 Highlights: AI Assistant, Dashboards, and Judge Optimization
MLflow 3.9.0 unveils AI Observability and Evaluation with an AI-powered assistant, agent performance dashboards, a MemAlign judge optimizer, judge builder UI, continuous online monitoring, and distributed tracing. This major release upgrades AI agent building, monitoring, and tooling.
MLflow 3.9.0 is a major release focused on AI Observability and Evaluation capabilities, bringing powerful new features for building, monitoring, and optimizing AI agents. This release introduces an AI-powered assistant, comprehensive dashboards for agent performance, a new judge optimization algorithm, judge builder UI, continuous monitoring with LLM judges, and distributed tracing.
1. MLflow Assistant Powered by Claude Code
MLflow Assistant transforms coding agents like Claude Code into experienced AI engineers by your side. Unlike typical chatbots, the assistant is aware of your codebase and context —it's not just a Q&A tool, but a full-fledged AI engineer that can find root causes for issues, set up quality tests, and apply LLMOps best practices to your project.
Key capabilities include:
- No additional costs: Use your existing Claude Code subscription. MLflow provides the knowledge and integration at no cost.
- Context-rich assistance: Understands your local codebase, project structure, and provides tailored recommendations—not generic advice.
- Complete dev-loop: Goes beyond Q&A to fetch MLflow data, read your code, and add tracing, evaluation, and versioning to your project.
- Fully customizable: Add custom skills, sub-agents, and permissions. Everything runs on your machine with full transparency.
Open the MLflow UI, navigate to the Assistant panel in any experiment page, and follow the setup wizard to get started.
2. Dashboards for Agent Performance Metrics
A new "Overview" tab in GenAI experiments provides pre-built charts and visualizations for monitoring agent performance at a glance. Monitor key metrics like latency, request counts, and quality scores without manual configuration. Identify performance trends and anomalies across your agent deployments, and get tool call summaries to understand how your agents are utilizing available tools.
Navigate to any GenAI experiment and click the "Overview" tab to access the dashboard. Charts are automatically populated based on your trace data. Have a specific visualization need? Request additional charts via GitHub Issues.
3. MemAlign: A New Judge Optimizer Algorithm
MemAlign is a new optimization algorithm that learns evaluation guidelines from past feedback and dynamically retrieves relevant examples at runtime. Improve judge accuracy by learning from human feedback patterns, reduce prompt engineering effort with automatic guideline extraction, and adapt judge behavior dynamically based on the input being evaluated.
Use the MemAlignOptimizer to optimize your judges with historical feedback:
import mlflow from mlflow.genai.judges import make_judge from mlflow.genai.judges.optimizers import MemAlignOptimizer # Create a judge judge = make_judge( name = "politeness", instructions = ( "Given a user question, evaluate if the chatbot's response is polite and respectful. " "Consider the tone, language, and context of the response.\n\n" "Question: {{ inputs }}\n" "Response: {{ outputs }}" ), feedback_value_type = bool, model = "openai:/gpt-5-mini", ) # Create the MemAlign optimizer optimizer = MemAlignOptimizer( reflection_lm = "openai:/gpt-5-mini" ) # Retrieve traces with human feedback traces = mlflow.search_traces(return_type = "list") # Align the judge aligned_judge = judge.align(traces = traces, optimizer = optimizer)4. Configuring and Building a Judge with Judge Builder UI
A new visual interface lets you create and test custom LLM judge prompts without writing code. Iterate quickly on judge criteria and scoring rubrics with immediate feedback, test judges on sample traces before deploying to production, and export validated judges to the Python SDK for programmatic integration.
Navigate to the "Judges" section in the MLflow UI and click "Create Judge." Define your evaluation criteria, scoring rubric, and test your judge against sample traces. Once satisfied, export the configuration to use with the MLflow SDK.
5. Continuous Online Monitoring with MLflow LLM Judges
Automatically run LLM judges on incoming traces without writing any code, enabling continuous quality monitoring of your agents in production. Detect quality issues in real-time as traces flow through your system, leverage pre-defined judges for common evaluations like safety, relevance, groundedness, and correctness, and get actionable assessments attached directly to your traces.
Go to the "Judges" tab in your experiment, select from pre-defined judges or use your custom judges, and configure which traces to evaluate. Assessments are automatically attached to matching traces as they arrive.
6. Distributed Tracing for Tracking End-to-end Requests
Track requests across multiple services with context propagation, enabling end-to-end visibility into distributed AI systems. Maintain trace continuity across microservices and external API calls, debug issues that span multiple services with a unified trace view, and understand latency and errors at each step of your distributed pipeline.
Use the get_tracing_context_headers_for_http_request and set_tracing_context_from_http_request_headers functions to inject and extract trace context.
Full Changelog
For a comprehensive list of changes, see the release change log.
Get Started
Install MLflow 3.9.0 to try these new features:
pip install mlflow==3.9.0Share Your Feedback
We'd love to hear about your experience with these new features:
- GitHub Issues - Report bugs or request features
- MLflow Roadmap - See what's coming next and share your ideas
- ⭐ Star us on GitHub - Show your support for the project
Learn More
- Join our upcoming webinar to see these features in action
- Check out the MLflow documentation for detailed guides
- Jan 29, 2026
- Date parsed from source:Jan 29, 2026
- First seen by Releasebot:Feb 18, 2026
v.3.9.0
MLflow 3.9.0 brings bold AI tooling with an in‑UI MLflow Assistant, a Trace Overview dashboard, and an integrated AI Gateway. It adds online monitoring via LLM judges, a Judge Builder UI, distributed tracing, and MemAlign for smarter evaluations, signaling a major upgrade for users.
We're excited to announce MLflow 3.9.0, which includes several notable updates:
Original source Report a problem
Major New Features:
🔮 MLflow Assistant: Figuring out the next steps to debug your apps and agents can be challenging. We're excited to introduce the MLflow Assistant, an in-product chatbot that can help you identify, diagnose, and fix issues. The assistant is backed by Claude Code, and directly passes context from the MLflow UI to Claude. Click on the floating "Assistant" button in the bottom right of the MLflow UI to get started!
📈 Trace Overview Dashboard: You can now get insights into your agent's performance at a glance with the new "Overview" tab in GenAI experiments. Many pre-built statistics are available out of the box, including performance metrics (e.g. latency, request count), quality metrics (based on assessments), and tool call summaries. If there are any additional charts you'd like to see, please feel free to raise an issue in the MLflow repository!
✨ AI Gateway: We're revamping our AI Gateway feature! AI Gateway provides a unified interface for your API requests, allowing you to route queries to your LLM provider(s) of choice. In MLflow 3.9.0, the Gateway server is now located directly in the tracking server, so you don't need to spin up a new process. Additional features such as passthrough endpoints, traffic splits, and fallback models are also available, with more to come soon! For more detailed information, please take a look at the docs.
🔎 Online Monitoring with LLM Judges: Configure LLM judges to automatically run on your traces, without having to write a line of code! You can either use one of our pre-defined judges, or provide your own prompt and instructions to create custom metrics. Head to the new "Judges" tab within the GenAI Experiment UI to get started.
🤖 Judge Builder UI: Define and iterate on custom LLM judge prompts directly from the UI! Within the new "Judges" tab, you can create your own prompt for an LLM judge, and test-run it on your traces to see what the output would be. Once you're happy with it, you can either use it for online monitoring (as mentioned above), or use it via the Python SDK for your evals.
🔗 Distributed Tracing: Trace context can now be propagated across different services and processes, allowing you to truly track request lifecycles from end to end. The related APIs are defined in the mlflow.tracing.distributed module (with more documentation to come soon).
📚 MemAlign - a new judge optimizer algorithm: We're excited to introduce MemAlignOptimizer, a new algorithm that makes your judges smarter over time. It learns general guidelines from past feedback while dynamically retrieving relevant examples at runtime, giving you more accurate evaluations.
Features:
[Gateway] Add LiteLLM provider to support many other providers (#19394, @TomeHirata)
[Gateway] Add passthrough support for Anthropic Messages API (#19423, @TomeHirata)
[Gateway] Add passthrough support for Gemini generateContent and streamGenerateContent APIs (#19425, @TomeHirata)
[Gateway] Add routing strategy and fallback configuration support for gateway endpoints (#19483, @TomeHirata)
[Gateway] Deprecate Unity Catalog function integration in AI Gateway (#19457, @harupy)
[Gateway / UI] Create List API Keys landing page (#19441, @BenWilson2)
[Gateway / UI] Add Create API Keys functionality (#19442, @BenWilson2)
[Gateway / UI] Add delete and update capabilities for API Keys (#19446, @BenWilson2)
[Gateway / UI] Add endpoint listing page and tab layout (#19474, @BenWilson2)
[Gateway / UI] Add Create endpoint page and enhance provider select (#19475, @BenWilson2)
[Gateway / UI] Add Model select functionality for endpoint creation (#19477, @BenWilson2)
[Gateway / UI] Add Auth config to endpoint creation (#19494, @BenWilson2)
[Gateway / UI] Add the Endpoint Edit Page (#19502, @BenWilson2)
[Gateway / UI] Refactor the provider display for better UX (#19503, @BenWilson2)
[Gateway / UI] Create Endpoint details page (#19537, @BenWilson2)
[Gateway / UI] Add security notice banner (#19538, @BenWilson2)
[Gateway / UI] Create common editable combo box with extra modal select (#19546, @BenWilson2)
[Evaluation] Introduce MemAlign as a new optimizer for judge alignment (#19598, @smoorjani)
[Evaluation] Parallelize LLM calls in MemAlign guideline distillation (#20291, @veronicalyu320)
[Evaluation] Add GePaAlignmentOptimizer for judge instruction optimization (#19882, @alkispoly-db)
[Evaluation] Add Fluency scorer for evaluating text quality (#19414, @alkispoly-db)
[Evaluation] Add KnowledgeRetention built-in scorer (#19436, @alkispoly-db)
[Evaluation] Implement automatic discovery for builtin scorers (#19443, @alkispoly-db)
[Evaluation] Add Phoenix (Arize) third-party scorer integration (#19473, @debu-sinha)
[Evaluation] Add gateway provider support for scorers (#19470, @danielseong1)
[Evaluation] Introduce a conversation simulator into mlflow.genai (#19614, @smoorjani)
[Evaluation] Integrate conversation simulation into mlflow.genai.evaluate (#19760, @smoorjani)
[Evaluation] Make conversation simulator work with datasets (#19845, @SomtochiUmeh)
[Evaluation] Support for conversational datasets with persona, goal, and context (#19686, @SomtochiUmeh)
[Evaluation] Introduce conversational guidelines scorer (#19729, @smoorjani)
[Evaluation] Update tool call correctness judge to accept expected tool calls (#19613, @smoorjani)
[Evaluation] Support trace parsing fallback using Databricks model (#19654, @AveshCSingh)
[Evaluation] Documentation for online evaluation / scoring (#20103, @dbczumar)
[Evaluation] Job backend: Update job backend to use static names rather than function full names (#19430, @WeichenXu123)
[Evaluation] Job backend: support job cancellation (#19565, @WeichenXu123)
[Tracing] Support distributed tracing (#19920, @WeichenXu123)
[Tracing] Trace Metrics backend (#19271, @serena-ruan)
[Tracing] Add IS NULL / IS NOT NULL comparator support for trace metadata filtering (#19720, @dbczumar)
[Tracing] Auto-navigate to Events tab when clicking error spans (#20188, @anshuman-sahu)
[Tracing] Support shift+select for Traces (#20125, @B-Step62)
[Tracing] SpringAI Integration (#19949, @joelrobin18)
[Tracing] Reasoning in Chat UI for OpenAI, Anthropic, Gemini, Langchain, and PydanticAI (#19535, #19541, #19627, #19651, #19657, @joelrobin18)
[UI] Current Page context to assistant (#20139, @joelrobin18)
[UI] Assistant regenerate button (#20066, @joelrobin18)
[UI] Copy button Assistant (#20063, @joelrobin18)
[UI] Overview tab for GenAI experiments (#19521, @serena-ruan)
[UI] Enable Scorers UI feature flags (#19842, @danielseong1)
[UI] Improve LLM judge creation modal UX and variable ordering (#19963, @danielseong1)
[UI] Hide instructions section for built-in LLM judges (#19883, @danielseong1)
[UI] Change model provider and name to dropdown list (#19653, @chenmoneygithub)
[Prompts] Support Jinja2 template in prompt registry (#19772, @B-Step62)
[Prompts] Support metaprompting in mlflow.genai.optimize_prompts() (#19762, @chenmoneygithub)
[Prompts] Add option to delegate saving dspy model to dspy.module.save API (#19704, @WeichenXu123)
[Prompts / UI] Add traces mode to prompts details page and implement filtered traces (#19599, @TomeHirata)
[Tracking] Support mlflow.genai.to_predict_fn for app invocation endpoints (#19779, @jennsun)
[Tracking] Add log_stream API for logging binary streams as artifacts (#19104, @harupy)
[Tracking] Add import_checkpoints API for databricks SGC Checkpointing with MLflow (#19839, @WeichenXu123)
[Tracking] Support GC clean up for Historical Jobs (#19626, @joelrobin18)
[Tracking] Add JupyterNotebookRunContext for Tracking local Jupyter notebook as the source (#19162, @iyashk)
[Tracking] Full docker image support with db (#19979, @serena-ruan)
[Tracking] Add react route handling to communicate with the tracking server (#19010, @BenWilson2)
[Tracking] [TypeScript SDK] Simplify Databricks auth by delegating to Databricks SDK (#19434, @simonfaltum)
[Models] Safe model serialization: Support saving pytorch model via torch.export.save, add skops serialization format, and deprecate unsafe pickle/cloudpickle formats (#18759, #18832, #19692, #20151, @WeichenXu123)
Bug fixes:
[Gateway] Fix Anthropic and Gemini streaming for LiteLLM providers (#20398, @TomeHirata)
[Build] Include git submodule contents in Python package build (#20394, @copilot-swe-agent)
[Tracing] Fix duplicate traces in semantic kernel autolog (#20206, @harupy)
[Tracing] Fix Claude autolog to prioritize settings.json over OS environment variables (#20376, @alkispoly-db)
[Evaluation] Fix temperature/json issues with ConversationSimulator on managed (#20236, @xsh310)
[Tracing / UI] Add support for OpenAI function calling inputs in chat UI parsing (#20058, @daniellok-db)
[Tracking] Update checking code for pickle deserialization (#20267, @WeichenXu123)
[Gateway] Fix Vertex AI model configuration (#20242, @TomeHirata)
[UI] Store gateway<>scorer binding correctly (#20176, @TomeHirata)
[Evaluation] Support SparkDF trace handling in eval (#20207, @BenWilson2)
[Evaluation] Fix tool name extraction for tool call correctness (#20201, @smoorjani)
[Prompts] Fix scorers issue in metaprompting (#20173, @chenmoneygithub)
[UI] Propagate Run id context to Assistant (#20138, @joelrobin18)
[Model Registry] Allow for model registration to use KMS auth from different workspace (#20156, @BenWilson2)
[UI] Improve scorer trace picker UX and validation (#20178, @danielseong1)
[Evaluation] Improve MemAlign optimizer for incremental judge alignment (#20049, @veronicalyu320)
[Evaluation] Fix bug with max tokens using max output tokens (#20174, @smoorjani)
[Evaluation] Fix a race condition bug when using DF inputs for genai eval (#20079, @BenWilson2)
[Tracking] Fix DATABRICKS_CONFIG_PROFILE env var detection when fetching databricks credentials (#20112, @daniellok-db)
[Gateway] Move gateway invocation validation to fastapi middleware (#20111, @TomeHirata)
[Prompts] Fix the length check in mlflow.genai.optimize_prompts() (#19993, @chenmoneygithub)
[UI] Fix trace selection not registering in SelectTracesModal (#20099, @joelrobin18)
[UI] Fix LimitOverrunError in Assistant streaming (#20078, @joelrobin18)
[Tracing] CC Token usage (#20022, @joelrobin18)
[Gateway] Remove MLflow-specific auth_mode from LiteLLMConfig (#20059, @TomeHirata)
[UI] Assistant UI fix for dark theme (#20056, @joelrobin18)
[Tracing] Isolate runtime context between opentelemetry and mlflow (#19797, @B-Step62)
[UI] Prevent spurious 404 requests for relative image URLs in markdown (#20003, @harupy)
[Tracing] Support MLflow tracing with OpenTelemetry auto-instrumentation (#19501, @serena-ruan)
[UI] [UI] Fix session selector table column resizing and link behavior (#19927, @danielseong1)
[Gateway] Add Azure provider support in gateway configuration (#19933, @TomeHirata)
[Gateway] Propagate extra auth config to LiteLLM provider (#19931, @TomeHirata)
[Evaluation / UI] Add missing retrieval context error for retrieval scorers (#19895, @danielseong1)
[Evaluation / UI] Improve trace selection UX in scorer/judge UI (#19913, @danielseong1)
[Model Registry / Models] Fix infer_code_paths to capture transitive imports of functions/classes (#19814, @copilot-swe-agent)
[Tracking] fix for addressing rest api call latency in databricks job run (#19886, @WeichenXu123)
[UI] Enable {{trace}} variable support in sample judge evaluation (#19851, @danielseong1)
[Scoring] Check security before extracting tar file (#19557, @WeichenXu123)
[Gateway] Fix authorization header duplication (#19853, @TomeHirata)
[Gateway] Fix Gateway error handling to translate MlflowException to HTTPException (#19728, @danielseong1)
[Gateway] Remove gateway_deprecated decorator - AI Gateway is not deprecated (#19821, @copilot-swe-agent)
[Tracking] Make local artifact location creation lazy to support read-only proxy environments (#19678, @BenWilson2)
[Evaluation] fixed databricks hosted llm failure due to response_schema injection (#19741, @sinanshamsudheen)
[Evaluation] Add @overload annotations to @scorer decorator for proper type inference (#19570, @mr-brobot)
[Tracking] Add debug logging for 500 errors in catch_mlflow_exception (#19781, @harupy)
[Tracing] [Bug fix] Support search traces by string feedback / expectation values (#19719, @dbczumar)
[Tracing / UI] Fix scorer creation UX issues (#19756, @danielseong1)
[Evaluation] Fix KnowledgeRetention model parameter not propagating to inner scorer (#19753, @danielseong1)
[Tracking] [BUG] serve-artifacts is not enabled in docker-compose #19700 (#19701, @zjffdu)
[Tracing] Fix type signature loss in @trace_disabled decorator (#19569, @mr-brobot)
[Tracking] Fix: Return 400 instead of 500 for invalid experiment_id (#19655, @copilot-swe-agent)
[Models] Fix schema enforcement for pandas StringDtype (#19518, @harupy)
[Tracing] Fix Python 3.12 DeprecationWarning from generator.throw() in tracing (#19629, @mr-brobot)
[Evaluation] Fix structured outputs for databricks serving endpoints (#19572, @smoorjani)
[Models / Scoring] Add dict to PyFuncOutput type alias for ResponsesAgent/ChatAgent/ChatModel (#19560, @copilot-swe-agent)
[Tracking] Fix enable_git_model_versioning to work from subdirectories (#19529, @copilot-swe-agent)
Documentation updates:
[Docs] fix: Remove multi_class argument from scikit-learn's LogisticRegression in docs (#20266, @SOORAJTS2001)
[Docs] Add doc for distributed tracing (#20027, @WeichenXu123)
[Docs] Add Judge Builder UI documentation (#20163, @danielseong1)
[Docs] Add framework integration examples for AI Gateway query-endpoint page (#20137, @TomeHirata)
[Docs] Add "Evaluation Examples" article (#19722, @achen530)
[Docs] [1/3] Add gateway tracing guide for LiteLLM, OpenRouter, and Vercel AI Gateway (#20031, @B-Step62)
[Docs] Update prompt optimization doc to include metaprompting (#19966, @chenmoneygithub)
[Docs] Reorganize gateway page structure (#19968, @TomeHirata)
[Build / Docs] Fix broken auth REST API documentation links (#19872, @copilot-swe-agent)
[Docs] Add setup and query documentation for new AI Gateway (#19804, @TomeHirata)
[Docs] Add additional eval dataset serialization examples (#19697, @BenWilson2)
[Docs] ML-60766: Add dataset schema from managed content to SDK reference page (#19676, @achen530)
[Docs / Prompts] Fix duplicate tags argument in register_prompt documentation example (#19591, @copilot-swe-agent)
[Docs] Fix ML-59546 eval quickstart links to wrong place, add notebook version of eval quickstart (#19511, @achen530)
[Docs] Add documentation for KnowledgeRetention scorer (#19478, @alkispoly-db)
Small bug fixes and documentation updates:
#20406, #20122, #20317, #20333, #20361, #20274, #20362, #20249, #20169, #20345, #20252, #20314, #20214, #20215, #20210, #20212, #20142, #20183, #20121, #20141, #20140, #20124, #20073, #20062, #20065, #19893, #19912, #19464, #19857, #19401, #19600, #19555, #19400, #19392, #19393, @B-Step62; #20323, #20263, #19982, #20218, #20143, #20146, #20145, #20064, #20117, #20144, #20110, #20050, #20017, #20116, #20118, #19989, #19953, #19836, #19915, #19955, #19952, #19940, #19939, #19938, #19937, #19877, #19874, #19869, #19867, #19865, #19837, #19835, #19834, #19864, #19873, #19833, #19825, #19876, #19799, #19798, #19793, #19771, #19770, #19635, #19634, #19633, #19632, #19624, #19622, #19621, #19620, #19631, #19619, #19747, #19609, #19608, #19607, #19606, #19604, #19603, #19602, #19601, #19588, #19587, #19581, #19585, #19610, #19590, #19580, #19579, #19578, #19577, #19576, #19234, @serena-ruan; #20378, #20385, #20205, #20237, #20193, #20171, #20155, #20170, #20132, #20097, #20100, #20101, #19736, #19717, #19716, #19759, #19718, #19714, #19713, #19712, #19711, #19840, #19710, #19709, #19708, #19777, #19707, @dbczumar; #20387, #19981, #19964, @bbqiu; #20390, #20334, #20208, #19978, #19980, #19875, #19854, #19816, #19815, #19796, #19806, #19785, #19789, #19769, #19748, #19773, #19782, #19706, #19523, #19505, #19450, #19482, #19458, #19433, #19431, #19455, #19417, #19426, #19424, @harupy; #20355, #20245, #20120, #20229, #20114, #20053, #20012, #19972, #20002, #19991, #19990, #19977, #19986, #19985, #19967, #19957, #19960, #19954, #19945, #19941, #19934, #19917, #19916, #19905, #19904, #19903, #19900, #19899, #19897, #19894, #19892, #19890, #19888, #19887, #19861, #19828, #19818, #19803, #19802, #19791, #19788, #19795, #19790, #19786, #19783, #19767, #19768, #19746, #19735, #19733, #19732, #19726, #19561, #19549, #19544, #19543, #19510, #19486, #19487, #19463, #19871, @copilot-swe-agent; #20308, #20264, #20109, #20181, #20180, #20177, #20134, #20107, #20015, #20007, #20008, #19930, #20006, #20005, #19965, #19942, #19944, #19950, #19936, #19947, #19948, #19946, #19870, #19824, #19823, #19856, #19863, #19858, #19860, #19849, #19822, #19765, #19792, #19764, #19763, #19618, #19453, #19452, #19404, #19390, #19290, @TomeHirata; #20350, #20203, #19675, #19677, #19674, #19476, #19447, @BenWilson2; #20286, #20157, #20051, #20216, #20200, #20213, #20194, #20072, #20195, #20175, #20039, #19844, #19935, #19696, #19451, #19409, @smoorjani; #20209, #20131, #19742, #19969, #19734, #19480, #19351, @daniellok-db; #20204, #20164, #20192, #19997, #19925, #19850, #19914, #19774, #19721, #19673, #19623, #19668, #19496, #19554, #19471, @danielseong1; #20037, #19884, #19846, #19843, #19813, #19454, #19391, #19322, #19388, #19307, #19382, @xsh310; #20130, @iyashk; #20147, #20030, #19962, #19826, @kevin-lyn; #20108, #20071, #19743, #20045, #20042, #19959, #19880, @SomtochiUmeh; #20025, #19662, #19749, #19738, #19419, @WeichenXu123; #19847, @jaceklaskowski; #19820, @Abhiii47; #19800, @shreenidhi2205; #19703, #19693, #19689, #19688, #19664, #19663, #19660, #19534, #19533, #19532, #19531, @hubertzub-db; #19652, @AMRUTH-ASHOK; #19493, #19495, @alkispoly-db; #16372, @mohammadsubhani; #19522, @pmeier - Jan 16, 2026
- Date parsed from source:Jan 16, 2026
- First seen by Releasebot:Feb 18, 2026
v3.9.0rc0
MLflow 3.9.0rc0 brings an in‑product MLflow Assistant, a Trace Overview dashboard, and a revamped AI Gateway. It adds online LLM judges, a Judge Builder UI, distributed tracing across services, and the MemAlignOptimizer for smarter evaluations.
Major New Features:
- 🔮 MLflow Assistant: Figuring out the next steps to debug your apps and agents can be challenging. We're excited to introduce the MLflow Assistant, an in-product chatbot that can help you identify, diagnose, and fix issues. The assistant is backed by Claude Code, and directly passes context from the MLflow UI to Claude. Click on the floating "Assistant" button in the bottom right of the MLflow UI to get started!
- 📈 Trace Overview Dashboard: You can now get insights into your agent's performance at a glance with the new "Overview" tab in GenAI experiments. Many pre-built statistics are available out of the box, including performance metrics (e.g. latency, request count), quality metrics (based on assessments), and tool call summaries. If there are any additional charts you'd like to see, please feel free to raise an issue in the MLflow repository!
- ✨ AI Gateway: We're revamping our AI Gateway feature! AI Gateway provides a unified interface for your API requests, allowing you to route queries to your LLM provider(s) of choice. In MLflow 3.9.0rc0, the Gateway server is now located directly in the tracking server, so you don't need to spin up a new process. Additional features such as passthrough endpoints, traffic splits, and fallback models are also available, with more to come soon! For more detailed information, please take a look at the docs.
- 🔎 Online Monitoring with LLM Judges: Configure LLM judges to automatically run on your traces, without having to write a line of code! You can either use one of our pre-defined judges, or provide your own prompt and instructions to create custom metrics. Head to the new "Judges" tab within the GenAI Experiment UI to get started.
- 🤖 Judge Builder UI: Define and iterate on custom LLM judge prompts directly from the UI! Within the new "Judges" tab, you can create your own prompt for an LLM judge, and test-run it on your traces to see what the output would be. Once you're happy with it, you can either use it for online monitoring (as mentioned above), or use it via the Python SDK for your evals.
- 🔗 Distributed Tracing: Trace context can now be propagated across different services and processes, allowing you to truly track request lifecycles from end to end. The related APIs are defined in the mlflow.tracing.distributed module (with more documentation to come soon).
- 📚 MemAlign - a new judge optimizer algorithm: We're excited to introduce MemAlignOptimizer, a new algorithm that makes your judges smarter over time. It learns general guidelines from past feedback while dynamically retrieving relevant examples at runtime, giving you more accurate evaluations.
Stay tuned for the full release, which will be packed with even more features and bugfixes.
To try out this release candidate, please run:
pip install mlflow==3.9.0rc0Please try it out and report any issues on the issue tracker.
Original source Report a problem - Dec 27, 2025
- Date parsed from source:Dec 27, 2025
- First seen by Releasebot:Feb 18, 2026
v3.8.1
MLflow 3.8.1 includes several bug fixes and documentation updates.
Bug fixes:
- [Tracking] Skip registering sqlalchemy store when sqlalchemy lib is not installed (#19563, @WeichenXu123)
- [Models / Scoring] fix(security): prevent command injection via malicious model artifacts (#19583, @ColeMurray)
- [Prompts] Fix prompt registration with model_config on Databricks (#19617, @TomeHirata)
- [UI] Fix UI blank page on plain HTTP by replacing crypto.randomUUID with uuid library (#19644, @copilot-swe-agent)
Small bug fixes and documentation updates:
- #19539, #19451, #19409, @smoorjani; #19493, @alkispoly-db
- Dec 26, 2025
- Date parsed from source:Dec 26, 2025
- First seen by Releasebot:Feb 18, 2026
MLflow 3.8.1
MLflow 3.8.1 delivers essential bug fixes and updates, smoothing Tracking, Security, Prompts, and UI stability while boosting documentation. It tackles command injection, missing sqlalchemy handling, and a UI blank page issue, with comprehensive change logs and docs linked.
MLflow 3.8.1 includes several bug fixes and documentation updates.
Bug fixes
- [Tracking] Skip registering sqlalchemy store when sqlalchemy lib is not installed (#19563, @WeichenXu123)
- [Models / Scoring] fix(security): prevent command injection via malicious model artifacts (#19583, @ColeMurray)
- [Prompts] Fix prompt registration with model_config on Databricks (#19617, @TomeHirata)
- [UI] Fix UI blank page on plain HTTP by replacing crypto.randomUUID with uuid library (#19644, @copilot-swe-agent)
Small bug fixes and documentation updates
- #19539, #19451, #19409, @smoorjani; #19493, @alkispoly-db
For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.
Original source Report a problem - Dec 22, 2025
- Date parsed from source:Dec 22, 2025
- First seen by Releasebot:Feb 18, 2026
v3.8.0
MLflow 3.8.0 debuts major features like prompt model configuration, real time trace display, new evaluators for safety and tool efficiency, and DeepEval/RAGAS scoring. It also introduces UI telemetry and a broad set of tracking, UI and docs improvements.
MLflow 3.8.0 includes several major features and improvements
Major Features
- ⚙️ Prompt Model Configuration: Prompts can now include model configuration, allowing you to associate specific model settings with prompt templates for more reproducible LLM workflows. (#18963, #19174, #19279, @chenmoneygithub)
- ⏳ In-Progress Trace Display: The Traces UI now supports displaying spans from in-progress traces with auto-polling, enabling real-time debugging and monitoring of long-running LLM applications. (#19265, @B-Step62)
- ⚖️ DeepEval and RAGAS Judges Integration: New get_judge API enables using DeepEval and RAGAS evaluation metrics as MLflow scorers, providing access to 20+ evaluation metrics including answer relevancy, faithfulness, and hallucination detection. (#18988, @smoorjani, #19345, @SomtochiUmeh)
- 🛡️ Conversational Safety Scorer: New built-in scorer for evaluating safety of multi-turn conversations, analyzing entire conversation histories for hate speech, harassment, violence, and other safety concerns. (#19106, @joelrobin18)
- ⚡ Conversational Tool Call Efficiency Scorer: New built-in scorer for evaluating tool call efficiency in multi-turn agent interactions, detecting redundant calls, missing batching opportunities, and poor tool selections. (#19245, @joelrobin18)
Important Notice
Collection of UI Telemetry. From MLflow 3.8.0 onwards, MLflow will collect anonymized data about UI interactions, similar to the telemetry we collect for the Python SDK. If you manage your own server, UI telemetry is automatically disabled by setting the existing environment variables: MLFLOW_DISABLE_TELEMETRY=true or DO_NOT_TRACK=true. If you do not manage your own server (e.g. you use a managed service or are not the admin), you can still opt out personally via the new "Settings" tab in the MLflow UI. For more information, please read the documentation on usage tracking.
Features:
- [Tracking] Add default passphrase support (#19360, @BenWilson2)
- [Tracing] Pydantic AI Stream support (#19118, @joelrobin18)
- [Docs] Deprecate Unity Catalog function integration in AI Gateway (#19457, @harupy)
- [Tracking] Add --max-results option to mlflow experiments search (#19359, @alkispoly-db)
- [Tracking] Enhance encryption security (#19253, @BenWilson2)
- [Tracking] Fix and simplify Gateway store interfaces (#19346, @BenWilson2)
- [Evaluation] Add inference_params support for LLM Judges (#19152, @debu-sinha)
- [Tracing] Support batch span export to UC Table (#19324, @B-Step62)
- [Tracking] Add endpoint tags (#19308, @BenWilson2)
- [Docs / Evaluation] Add MLFLOW_GENAI_EVAL_MAX_SCORER_WORKERS to limit concurrent scorer execution (#19248, @debu-sinha)
- [Evaluation / Tracking] Enable search_datasets in Databricks managed MLflow (#19254, @alkispoly-db)
- [Prompts] render text prompt previews in markdown (#19200, @ispoljari)
- [UI] Add linked prompts filter for trace search tab (#19192, @TomeHirata)
- [Evaluation] Automatically wrap async functions when passed to predict_fn (#19249, @smoorjani)
- [Evaluation] [3/6][builtin judges] Conversational Role Adherence (#19247, @joelrobin18)
- [Tracking] [Endpoints] [1/x] Add backend DB tables for Endpoints (#19002, @BenWilson2)
- [Tracking] [Endpoints] [3/x] Entities base definitions (#19004, @BenWilson2)
- [Tracking] [Endpoints] [4/x] Abstract store interface (#19005, @BenWilson2)
- [Tracking] [Endpoints] [5/x] SQL Store backend for Endpoints (#19006, @BenWilson2)
- [Tracking] [Endpoints] [6/x] Protos and entities interfaces (#19007, @BenWilson2)
- [Tracking] [Endpoints] [7/x] Add rest store implementation (#19008, @BenWilson2)
- [Tracking] [Endpoints] [8/x] Add credential cache (#19014, @BenWilson2)
- [Tracking] [Endpoints] [9/x] Add provider, model, and configuration handling (#19009, @BenWilson2)
- [Evaluation / UI] Add show/hide visibility control for Evaluation runs chart view (#18797) (#18852, @pradpalnis)
- [Tracking] Add mlflow experiments get command (#19097, @alkispoly-db)
- [Server-infra] [ Gateway 1/10 ] Simplify secrets and masked secrets with map types (#19440, @BenWilson2)
Bug fixes:
- [Tracing / UI] Branch 3.8 patch: Fix GraphQL SearchRuns filter using invalid attribute key in trace comparison (#19526, @WeichenXu123)
- [Scoring / Tracking] Fix artifact download performance regression (#19520, @copilot-swe-agent)
- [Tracking] Fix SQLAlchemy alias conflict in _search_runs for dataset filters (#19498, @fredericosantos)
- [Tracking] Add auth support for GraphQL routes (#19278, @BenWilson2)
- [] Fix SQL injection vulnerability in UC function execution (#19381, @harupy)
- [UI] Fix MultiIndex column search crash in dataset schema table (#19461, @copilot-swe-agent)
- [Tracking] Make datasource failures fail gracefully (#19469, @BenWilson2)
- [Tracing / Tracking] Fix litellm autolog for versions >= 1.78 (#19459, @harupy)
- [Model Registry / Tracking] Fix SQLAlchemy engine connection pool leak in model registry and job stores (#19386, @harupy)
- [UI] [Bug fix] Traces UI: Support filtering on assessments with multiple values (e.g. error and boolean) (#19262, @dbczumar)
- [Evaluation / Tracing] Fix error initialization in Feedback (#19340, @alkispoly-db)
- [Models] Switch container build to subprocess for Sagemaker (#19277, @BenWilson2)
- [Scoring] Fix scorers issue on Strands traces (#18835, @joelrobin18)
- [Tracking] Stop initializing backend stores in artifacts only mode (#19167, @mprahl)
- [Evaluation] Parallelize multi-turn session evaluation (#19222, @AveshCSingh)
- [Tracing] Add safe attribute capture for pydantic_ai (#19219, @BenWilson2)
- [Model Registry] Fix UC to UC copying regression (#19280, @BenWilson2)
- [Tracking] Fix artifact path traversal vector (#19260, @BenWilson2)
- [UI] Fix issue with auth controls on system metrics (#19283, @BenWilson2)
- [Models] Add context loading for ChatModel (#19250, @BenWilson2)
- [Tracing] Fix trace decorators usage for LangGraph async callers (#19228, @BenWilson2)
- [Tracking] Update docker compose to use --artifacts-destination not --default-artifact-root (#19215, @B-Step62)
- [Build] Reduce clint error message verbosity by consolidating README instructions (#19155, @copilot-swe-agent)
Documentation updates:
- [Docs] Add specific references for correctness scorers (#19472, @BenWilson2)
- [Docs] Add documentation for Fluency scorer (#19481, @alkispoly-db)
- [Docs] Update eval quickstart to put all code into a script (#19444, @achen530)
- [Docs] Add documentation for KnowledgeRetention scorer (#19478, @alkispoly-db)
- [Evaluation] Fix non-reproducible code examples in deep-learning.mdx (#19376, @saumilyagupta)
- [Docs / Evaluation] fix: Confusing documentation for mlflow.genai.evaluate() (#19380, @brandonhawi)
- [Docs] Deprecate model logging of OpenAI flavor (#19325, @TomeHirata)
- [Docs] Add rounded corners to video elements in documentation (#19231, @copilot-swe-agent)
- [Docs] Sync Python/TypeScript tab selections in tracing quickstart docs (#19184, @copilot-swe-agent)
Small bug fixes and documentation updates:
#19497, #19358, #19322, #19383, #19288, #19287, #19230, #19225, @xsh310; #19504, @WeichenXu123; #19499, #19465, #19241, @B-Step62; #19479, #19385, #19297, #19347, #19314, #19286, #19269, @TomeHirata; #18894, @BnnaFish; #19480, #19427, #19351, #19312, #19292, #19303, #19291, #19418, #19395, #19240, #19267, #19102, #19082, #19076, @daniellok-db; #19463, #19370, #19369, #19368, #19367, #19366, #19363, #19354, #19302, #19272, #19266, #19258, #19255, #19242, #19236, #19235, #19203, #19214, #19212, #19210, #19204, #19197, #19196, #19194, #19190, #19182, #19178, #19179, #19163, #19157, #19150, #19137, #19132, #19114, #19115, #19113, #19112, #19111, #19110, #19107, #19091, #19090, #19078, @copilot-swe-agent; #19437, @SomtochiUmeh; #19420, #19329, #19317, #19207, #19086, @kevin-lyn; #19339, #19263, #19438, #19412, #19411, #19355, #19341, #19034, #19029, #19252, @smoorjani; #19416, #19399, #19402, #19353, #19313, #19296, #19294, #19264, #19202, #19206, #19165, #19161, #19158, #19126, #19147, #19099, @harupy; #19357, #19343, #19342, #19335, #19261, #19226, #19227, @BenWilson2; #19344, #19331, #19270, #19239, #19211, @serena-ruan; #19323, @bbqiu; #19373, @alkispoly-db; #19320, #19311, @kriscon-db; #19309, @stefanwayon; #19063, @cyficowley; #19160, @Killian-fal; #19142, #19141, @dbczumar; #19089, @hubertzub-db; #19098, @achen530
Original source Report a problem - Dec 21, 2025
- Date parsed from source:Dec 21, 2025
- First seen by Releasebot:Feb 18, 2026
MLflow 3.8.0
MLflow 3.8.0 brings major upgrades for seamless LLM workflows. It adds prompt model configuration, real time in progress trace displays and new evaluators for safety and tool call efficiency. UI telemetry and a wide range of tracking, evaluation and docs improvements round out the release.
MLflow 3.8.0 includes several major features and improvements
Major Features
- ⚙️ Prompt Model Configuration: Prompts can now include model configuration, allowing you to associate specific model settings with prompt templates for more reproducible LLM workflows. (#18963, #19174, #19279, @chenmoneygithub)
- ⏳ In-Progress Trace Display: The Traces UI now supports displaying spans from in-progress traces with auto-polling, enabling real-time debugging and monitoring of long-running LLM applications. (#19265, @B-Step62)
- ⚖️ DeepEval and RAGAS Judges Integration: New get_judge API enables using DeepEval and RAGAS evaluation metrics as MLflow scorers, providing access to 20+ evaluation metrics including answer relevancy, faithfulness, and hallucination detection. (#18988, @smoorjani, #19345, @SomtochiUmeh)
- 🛡️ Conversational Safety Scorer: New built-in scorer for evaluating safety of multi-turn conversations, analyzing entire conversation histories for hate speech, harassment, violence, and other safety concerns. (#19106, @joelrobin18)
- ⚡ Conversational Tool Call Efficiency Scorer: New built-in scorer for evaluating tool call efficiency in multi-turn agent interactions, detecting redundant calls, missing batching opportunities, and poor tool selections. (#19245, @joelrobin18)
Important Notice
- Collection of UI Telemetry. From MLflow 3.8.0 onwards, MLflow will collect anonymized data about UI interactions, similar to the telemetry we collect for the Python SDK. If you manage your own server, UI telemetry is automatically disabled by setting the existing environment variables: MLFLOW_DISABLE_TELEMETRY=true or DO_NOT_TRACK=true. If you do not manage your own server (e.g. you use a managed service or are not the admin), you can still opt out personally via the new "Settings" tab in the MLflow UI. For more information, please read the documentation on usage tracking.
Features
- [Tracking] Add default passphrase support (#19360, @BenWilson2)
- [Tracing] Pydantic AI Stream support (#19118, @joelrobin18)
- [Docs] Deprecate Unity Catalog function integration in AI Gateway (#19457, @harupy)
- [Tracking] Add --max-results option to mlflow experiments search (#19359, @alkispoly-db)
- [Tracking] Enhance encryption security (#19253, @BenWilson2)
- [Tracking] Fix and simplify Gateway store interfaces (#19346, @BenWilson2)
- [Evaluation] Add inference_params support for LLM Judges (#19152, @debu-sinha)
- [Tracing] Support batch span export to UC Table (#19324, @B-Step62)
- [Tracking] Add endpoint tags (#19308, @BenWilson2)
- [Docs / Evaluation] Add MLFLOW_GENAI_EVAL_MAX_SCORER_WORKERS to limit concurrent scorer execution (#19248, @debu-sinha)
- [Evaluation / Tracking] Enable search_datasets in Databricks managed MLflow (#19254, @alkispoly-db)
- [Prompts] render text prompt previews in markdown (#19200, @ispoljari)
- [UI] Add linked prompts filter for trace search tab (#19192, @TomeHirata)
- [Evaluation] Automatically wrap async functions when passed to predict_fn (#19249, @smoorjani)
- [Evaluation] [3/6][builtin judges] Conversational Role Adherence (#19247, @joelrobin18)
- [Tracking] [Endpoints] [1/x] Add backend DB tables for Endpoints (#19002, @BenWilson2)
- [Tracking] [Endpoints] [3/x] Entities base definitions (#19004, @BenWilson2)
- [Tracking] [Endpoints] [4/x] Abstract store interface (#19005, @BenWilson2)
- [Tracking] [Endpoints] [5/x] SQL Store backend for Endpoints (#19006, @BenWilson2)
- [Tracking] [Endpoints] [6/x] Protos and entities interfaces (#19007, @BenWilson2)
- [Tracking] [Endpoints] [7/x] Add rest store implementation (#19008, @BenWilson2)
- [Tracking] [Endpoints] [8/x] Add credential cache (#19014, @BenWilson2)
- [Tracking] [Endpoints] [9/x] Add provider, model, and configuration handling (#19009, @BenWilson2)
- [Evaluation / UI] Add show/hide visibility control for Evaluation runs chart view (#18797) (#18852, @pradpalnis)
- [Tracking] Add mlflow experiments get command (#19097, @alkispoly-db)
- [Server-infra] [ Gateway 1/10 ] Simplify secrets and masked secrets with map types (#19440, @BenWilson2)
Bug fixes
- [Tracing / UI] Branch 3.8 patch: Fix GraphQL SearchRuns filter using invalid attribute key in trace comparison (#19526, @WeichenXu123)
- [Scoring / Tracking] Fix artifact download performance regression (#19520, @copilot-swe-agent)
- [Tracking] Fix SQLAlchemy alias conflict in _search_runs for dataset filters (#19498, @fredericosantos)
- [Tracking] Add auth support for GraphQL routes (#19278, @BenWilson2)
- [] Fix SQL injection vulnerability in UC function execution (#19381, @harupy)
- [UI] Fix MultiIndex column search crash in dataset schema table (#19461, @copilot-swe-agent)
- [Tracking] Make datasource failures fail gracefully (#19469, @BenWilson2)
- [Tracing / Tracking] Fix litellm autolog for versions >= 1.78 (#19459, @harupy)
- [Model Registry / Tracking] Fix SQLAlchemy engine connection pool leak in model registry and job stores (#19386, @harupy)
- [UI] [Bug fix] Traces UI: Support filtering on assessments with multiple values (e.g. error and boolean) (#19262, @dbczumar)
- [Evaluation / Tracing] Fix error initialization in Feedback (#19340, @alkispoly-db)
- [Models] Switch container build to subprocess for Sagemaker (#19277, @BenWilson2)
- [Scoring] Fix scorers issue on Strands traces (#18835, @joelrobin18)
- [Tracking] Stop initializing backend stores in artifacts only mode (#19167, @mprahl)
- [Evaluation] Parallelize multi-turn session evaluation (#19222, @AveshCSingh)
- [Tracing] Add safe attribute capture for pydantic_ai (#19219, @BenWilson2)
- [Model Registry] Fix UC to UC copying regression (#19280, @BenWilson2)
- [Tracking] Fix artifact path traversal vector (#19260, @BenWilson2)
- [UI] Fix issue with auth controls on system metrics (#19283, @BenWilson2)
- [Models] Add context loading for ChatModel (#19250, @BenWilson2)
- [Tracing] Fix trace decorators usage for LangGraph async callers (#19228, @BenWilson2)
- [Tracking] Update docker compose to use --artifacts-destination not --default-artifact-root (#19215, @B-Step62)
- [Build] Reduce clint error message verbosity by consolidating README instructions (#19155, @copilot-swe-agent)
Documentation updates
- [Docs] Add specific references for correctness scorers (#19472, @BenWilson2)
- [Docs] Add documentation for Fluency scorer (#19481, @alkispoly-db)
- [Docs] Update eval quickstart to put all code into a script (#19444, @achen530)
- [Docs] Add documentation for KnowledgeRetention scorer (#19478, @alkispoly-db)
- [Evaluation] Fix non-reproducible code examples in deep-learning.mdx (#19376, @saumilyagupta)
- [Docs / Evaluation] fix: Confusing documentation for mlflow.genai.evaluate() (#19380, @brandonhawi)
- [Docs] Deprecate model logging of OpenAI flavor (#19325, @TomeHirata)
- [Docs] Add rounded corners to video elements in documentation (#19231, @copilot-swe-agent)
- [Docs] Sync Python/TypeScript tab selections in tracing quickstart docs (#19184, @copilot-swe-agent)
For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.
Original source Report a problem - Dec 16, 2025
- Date parsed from source:Dec 16, 2025
- First seen by Releasebot:Feb 18, 2026
v3.8.0rc0
MLflow 3.8.0rc0 delivers major prompt model configuration, real time in-progress trace display, and new evaluators like DeepEval integration plus built in safety and tool call scorers. This RC previews the final 3.8.0 release and richer LLM workflow capabilities.
MLflow 3.8.0rc0 includes several major features and improvements. More features to come in the final 3.8.0 release!
To try out this release candidate:
pip install mlflow==3.8.0rc0
Major Features
- ⚙️ Prompt Model Configuration: Prompts can now include model configuration, allowing you to associate specific model settings with prompt templates for more reproducible LLM workflows. (#18963, #19174, #19279, @chenmoneygithub)
- ⏳ In-Progress Trace Display: The Traces UI now supports displaying spans from in-progress traces with auto-polling, enabling real-time debugging and monitoring of long-running LLM applications. (#19265, @B-Step62)
- ⚖️ DeepEval Judges Integration: New get_judge API enables using DeepEval's evaluation metrics as MLflow scorers, providing access to 20+ evaluation metrics including answer relevancy, faithfulness, and hallucination detection. (#18988, @smoorjani)
- 🛡️ Conversational Safety Scorer: New built-in scorer for evaluating safety of multi-turn conversations, analyzing entire conversation histories for hate speech, harassment, violence, and other safety concerns. (#19106, @joelrobin18)
- ⚡ Conversational Tool Call Efficiency Scorer: New built-in scorer for evaluating tool call efficiency in multi-turn agent interactions, detecting redundant calls, missing batching opportunities, and poor tool selections. (#19245, @joelrobin18)