How the Databricks Agent Bricks Enterprise Agent Platform Actually Works

Evaluating enterprise agent platforms against a governance lens? Use the complete GAIG vendor interview guide to run structured evaluation across any platform — or browse AI Governance Platforms, Model Observability, and AI Access Control in the marketplace. Submit an inquiry to be matched with vendors that meet your specific agentic governance requirements.

Submit an Inquiry

Key Statistics

70% accuracy improvement over standard RAG using Unity Catalog metadata
Databricks, April 2026
63% of customers route tasks across two or more model families
Databricks production data
30% improvement in multi-step workflow performance with business context grounding
Databricks, April 2026
6 CLEARS evaluation dimensions assessed after every agent session
MLflow CLEARS Framework

There is a Clear Production Gap, How did Databricks Fix it?

To understand what Agent Bricks is solving, you first have to understand the specific way that agentic AI deployments fail in enterprise environments. The failure pattern is almost always the same, and it almost never originates in the model itself.

A team builds an agent. The agent performs impressively in the demo. It answers questions correctly, calls tools in the right order, produces useful output. The team ships it to production. Within weeks, problems surface. The agent occasionally retrieves data from the wrong time period because the retrieval system does not understand the business definition of "current." It sometimes accesses data that the user triggering the query is technically not authorized to see, because the permissions model was applied at the tool level rather than the execution level. It hallucinates facts about the business because it is reasoning over column names rather than the semantic meaning those columns encode in the company's data model. When it fails, the engineering team cannot trace why, because the reasoning chain is not logged anywhere in a way that allows inspection and debugging. When two agents are chained together in a workflow, state gets corrupted because neither agent was designed to handle the other's output format cleanly.

Databricks names this gap with precision. The blog post by Kasey Uhlenhuth states:

"The most valuable agents are defined by how deeply they connect to your business: customer records, operational systems, internal policies, and institutional knowledge. A financial services agent reviewing loan applications and applying company underwriting policies is valuable because it operates in a business context, not because of the model or framework alone. That context is what makes agents useful, and what makes them difficult to run in production."
Kasey Uhlenhuth
Databricks
via Agent Bricks: The Governed Enterprise Agent Platform, April 2026

The loan application example is exactly the right illustration of the production gap. A financial services agent reviewing loan applications needs to understand your company's underwriting policies — not the general concept of underwriting, but your specific credit thresholds, your specific regulatory obligations under your specific licenses, your specific exceptions and overrides and escalation paths. That institutional knowledge does not live in any model's training data. It lives in your internal systems — your policy documents, your data schemas, your business definitions cataloged in your data governance infrastructure. An agent that cannot access and reason over that institutional context is, at best, producing confident-sounding answers that are wrong for your specific business.

70% higher accuracy than standard RAG when Unity Catalog metadata is embedded directly into retrieval and planning.
Databricks reports a 70% accuracy improvement and a 30% improvement in multi-step workflow performance when agents use Unity Catalog business context — schema definitions, lineage, data quality signals, and business definitions — rather than raw retrieval against unstructured data.
Source: Databricks Agent Bricks launch post, April 2026

The 70% accuracy improvement figure is worth unpacking because it reveals something important about why standard Retrieval Augmented Generation fails for enterprise agents. Standard RAG takes a query, searches a vector index of documents, retrieves the top results, and passes them to the model as context. The model then reasons over whatever it retrieved. The problem is that the retrieval system has no understanding of your business semantics. It matches words and embeddings, not meaning in the context of your specific operations. When an analyst asks "what was our Q1 performance in the northeast region," standard RAG searches for documents containing those words. Unity Catalog-grounded retrieval understands that "northeast region" maps to a specific set of office codes in your CRM, that "Q1" ends on March 31st in your fiscal calendar, and that "performance" means the specific KPIs your business tracks, defined in the semantic layer your data team built. The agent is not just retrieving more data — it is retrieving the right data, correctly interpreted. That gap accounts for most of the accuracy difference.

What Makes Agent Bricks Different

Databricks built Agent Bricks around three interconnected architectural principles. These are not marketing categories. They are structural design decisions that determine how every other capability in the platform works. Understanding these three principles is the key to understanding why Agent Bricks is architecturally different from other agent platforms, and what the governance implications of that architecture are.

First: Open and Multi-AI

The first architectural commitment is that Agent Bricks does not lock organizations into any single model, any single framework, or any single cloud. This sounds like standard enterprise positioning but the implementation is technically specific and the governance implications are significant.

Agent Bricks natively supports frontier models from every major provider — OpenAI, Anthropic, Google, and others — through a single unified API. This means that when a developer writes an agent that calls a model, they write the call once. The routing of that call to the correct model, with the correct credentials, at the correct cost tier, with the correct fallback if the primary model is unavailable, is handled by the AI Gateway layer of Agent Bricks. Developers do not write separate integration code per provider. Administrators do not manage separate billing relationships per provider. Governance policies apply uniformly regardless of which model is servicing a given request.

The platform also supports building and deploying agents with major frameworks including LangGraph and OpenAI Agents SDK. This matters because it means organizations can bring their existing agent code — built on whatever framework their engineering team knows — without rebuilding for a proprietary framework. Agent Bricks is the governance and infrastructure layer that wraps those frameworks, not a replacement for them.

"Today, 63% of customers route tasks across two or more model families, ensuring agents remain flexible and resilient as models evolve."
Kasey Uhlenhuth, Databricks
via Agent Bricks launch post

That 63% figure from production deployments confirms something GAIG has documented extensively in the context of coding agent sprawl: enterprise organizations are not choosing one AI model. They are running multiple models simultaneously, routing different tasks to different models based on cost, capability, and latency requirements. The governance challenge this creates — how do you apply consistent policies, cost controls, and audit logging across tasks that may be serviced by different model providers in different sessions — is exactly what the AI Gateway layer is built to solve. More on AI Gateway specifically in Section 4.

AI Governance Capabilities Explained: What Platforms Actually Do and How to Choose the Right One

Second: Unified Governance

The second architectural commitment is the most consequential for governance and compliance teams: Agent Bricks governs not just the agent, but everything the agent interacts with, through a single unified system. This is structurally different from how most agent platforms handle governance.

Most agent platforms govern at the agent level. They define what tools an agent can call, what permissions the agent holds, what the agent is allowed to do. This sounds complete but it creates a critical gap: the agent's permissions are separate from the data permissions of the user who triggered the agent. This means an agent could theoretically access data that the user requesting the task is not authorized to see — either because the agent was granted broader access than the triggering user, or because the agent accumulated permissions across multiple sessions and nobody reviewed the cumulative access profile.

Agent Bricks solves this through a mechanism called on-behalf-of token passing. Here is how it works in detail:

On-Behalf-Of Token Passing — How Agent Identity Works in Agent Bricks

User Authenticates to Databricks
A user logs into their organization's Databricks environment with their corporate credentials. This creates an authenticated session with a specific identity, a specific set of data permissions defined by their role in Unity Catalog, and a specific set of organizational boundaries they are authorized to operate within. This identity is the ground truth for everything that follows.
Unity Catalog Identity

User Triggers an Agent Task
The user requests that an AI agent perform a task — analyze this dataset, summarize these documents, answer this question about our operations. The agent begins execution. At this moment, the critical decision is made: does the agent run under its own permissions, or under the user's permissions? Agent Bricks makes this decision structurally: the agent always runs under the user's permissions, inherited through an on-behalf-of token.
Token Passing Initiated

Agent Inherits User Identity for All Data Access
When the agent queries the data lakehouse, calls an external API through a Managed OAuth MCP Connector, reads documents from the Knowledge Assistant, or executes any other data access operation, it does so as the user — not as itself. The agent's token is the user's token, scoped to this specific session. This means the agent can only access data that the user is authorized to access. If the user cannot see payroll data, neither can the agent acting on their behalf. This constraint is enforced at the infrastructure layer, not at the policy layer. It cannot be bypassed by prompt engineering or tool manipulation.
User-Scoped Token Active

All Actions Logged Under User Identity
Every data access, every tool call, every model invocation, and every output generated during the agent session is logged to Unity Catalog under the user's identity. This means the audit trail for an agent session looks exactly like the audit trail for a human user session — timestamped, identity-attributed, and queryable. When a compliance team needs to produce evidence that a specific user's data access was within their authorized scope, the agent's actions appear in the same audit record as all other actions under that identity.
Unity Catalog Audit Log

External Services Receive Governed Credentials
When the agent needs to access external services — GitHub repositories, Atlassian documents, Glean search, or other systems — it connects through Managed OAuth MCP Connectors. These connectors manage credentials centrally in Databricks. The agent never touches raw API keys or OAuth secrets directly. Credentials are resolved at connection time, used for that session, and governed by the same Unity Catalog access control policies that govern all other data access. External service connections are observable, auditable, and revocable from a single location.
Managed OAuth MCP Connectors

The governance significance of on-behalf-of token passing cannot be overstated. The most common governance failure mode in agentic AI deployments is Permission Creep Drift — where an agent accumulates access permissions across sessions and deployment cycles until its effective access profile far exceeds what was originally authorized. On-behalf-of token passing structurally prevents this. The agent never holds permissions of its own. Every session, every query, every tool call is bounded by what the triggering user is authorized to do. There is no accumulation to review, no drift to monitor, no cumulative access profile to assess. The agent's access is always exactly the user's access — no more.

The CISO's Guide to AI Pre-Failure Signals: How to Read Your Governance Stack Before Control Breaks

Third: Accurate Because It Understands Business Context

The third architectural pillar is the one that most directly explains the 70% accuracy improvement figure and that separates Agent Bricks from general-purpose agent frameworks. Agent accuracy in production depends on the agent understanding what the data means in the context of your specific business — not just what the data says.

Unity Catalog, Databricks' data and AI governance catalog, stores significantly more than table schemas. It stores business definitions — what "revenue" means in your business, which columns correspond to which business concepts, what the quality score of a given dataset is, how data flows between systems, what the lineage of a particular table is, who owns it and when it was last validated. This metadata is the institutional knowledge that makes data analysis accurate in a business context.

Agent Bricks embeds this metadata directly into the retrieval and planning phase of agent execution. When an agent is planning how to answer a question, it does not just search for relevant documents. It searches through Unity Catalog metadata to understand which datasets are relevant, what the business definitions of the key concepts are, what the data quality signals indicate about which sources to trust, and what the relationships between data entities mean in your business context. The blog post by Uhlenhuth explains this for structured data specifically:

"For structured data, Genie Spaces leverage the semantic layer so agents reason over business definitions, not raw column names. This means agents return answers aligned to how your business actually operates, not just what the data says."
Kasey Uhlenhuth, Databricks
via Agent Bricks launch post

Genie Spaces is the specific product within Databricks that provides this semantic layer for structured data. When a user asks an agent a question about their data — "what were our top ten customers by revenue last quarter?" — Genie does not generate SQL by mapping the natural language query directly to raw column names in whatever table seems most relevant. It goes through the semantic layer, which knows that "customers" maps to a specific entity in your data model, that "revenue" is defined as a specific metric with a specific calculation, and that "last quarter" means a specific date range in your fiscal calendar. The SQL it generates is more accurate because it is generated against semantic meaning, not column names. This is the mechanism behind most of the accuracy improvement figure.

How The Agent Execution Actually Works Completely

Understanding the architecture principles is necessary context, but the more detailed question for governance and engineering teams is: what actually happens when an agent runs inside Agent Bricks? What is the process from user request to final output, and where does governance enforcement happen at each step?

Here is the complete execution process, broken down step by step, with the governance mechanisms that operate at each stage.

Request Entry and Identity Resolution
A user submits a request through whatever interface is connected to the agent — a Databricks App, a Genie Space, a custom application built with the Custom Agents on Apps framework, or a coding agent like Cursor or Claude Code connected through AI Gateway. The request arrives at Agent Bricks with an authenticated identity. That identity is immediately resolved against Unity Catalog to determine: who is this user, what data are they authorized to access, what organizational boundaries apply to their session, and what agent capabilities are they permitted to invoke.

Planning Phase — Chain-of-Thought with Business Context
The agent enters the planning phase. It receives the user's request and begins reasoning about how to fulfill it. This reasoning draws on business context from Unity Catalog metadata — schema definitions, semantic layer knowledge from Genie, data quality signals, lineage information. If the request involves structured data, the agent consults Genie Spaces' semantic layer to understand what the relevant business concepts map to in the actual data model. If it involves unstructured documents, it queries the Knowledge Assistant for relevant enterprise documents. Every step of this planning process is logged as part of the chain-of-thought traceability record — meaning every reasoning step the agent takes is captured and indexed for later inspection, debugging, and audit.

Chain-of-Thought Traceability · Genie Spaces · Knowledge Assistant

Tool Selection and Pre-Execution Guardrails
The agent selects the tools it will invoke to execute the plan. This is where AI Gateway's guardrail layer becomes active before any tool call is made. The guardrails inspect the planned tool call and the context around it for: PII exposure risk (is the agent about to pass personally identifiable information to an external service?), unsafe content (does the request or the planned response contain harmful content?), prompt injection patterns (does the user input contain embedded instructions trying to redirect the agent?), data exfiltration patterns (is the agent being directed to send internal data to an external destination that violates policy?), and hallucination risk indicators. If any guardrail fires, the action is blocked before execution. The block is logged. The user receives an explanation of why the action was not taken.

AI Gateway Guardrails · PII Detection · Injection Defense · DLP

Model Routing Through Foundation Model API
When the agent needs to invoke a model — for generation, for evaluation, for reasoning over retrieved context — that invocation is routed through the Foundation Model API. The routing logic considers: which model family is best suited to this specific sub-task, what the current cost budget for this session is, what the latency requirements are, and whether the primary model is available. If the primary model is unavailable or over-budget, the routing logic applies automatic fallback to an alternative model in the same tier. The application does not break. The agent continues. The governance policy about which models are permitted for this use case continues to apply to whatever model is servicing the request. Cost optimization runs continuously — the Foundation Model API selects the most cost-effective model that meets the performance requirements for each individual call.

Foundation Model API · Intelligent Routing · Automatic Fallback

Data Access Under User Identity
When the agent queries data — from the lakehouse, from a structured table via Genie, from documents via Knowledge Assistant, from external services via Managed OAuth MCP Connectors — every access is made under the user's identity through on-behalf-of token passing, as described in Section 2. Unity Catalog enforces the user's data permissions at the query level. The agent cannot access data the user is not authorized to see, regardless of how the tool call was constructed or what the agent's own permissions might be. This access is logged to the Unity Catalog audit trail with the user's identity, a timestamp, and the specific data objects accessed.

On-Behalf-Of Token Passing · Unity Catalog Access Control · Audit Logging

Multi-Agent Coordination via Supervisor Agent
For complex tasks that require multiple specialized agents working in sequence or in parallel, Agent Bricks uses the Supervisor Agent pattern. A Supervisor Agent receives the high-level task and breaks it down into sub-tasks. It routes each sub-task to the appropriate specialized agent — a data analysis agent, a document processing agent, a code generation agent, a web research agent. Each sub-agent executes its sub-task and returns its output to the Supervisor. The Supervisor synthesizes the outputs and produces the final result. All of this happens under unified governance: the same identity constraints, the same guardrails, the same audit logging apply to every agent in the chain. The Supervisor coordinates execution — the governance layer does not need to be rebuilt for each agent in the workflow.

Supervisor Agent · Multi-Agent Orchestration · LangGraph / OpenAI Agents SDK

State and Memory via Lakebase
For long-running workflows where an agent needs to maintain context across multiple interactions — a multi-day research project, an ongoing workflow that picks up where it left off, a conversation that spans multiple sessions — Agent Bricks stores state and memory in Lakebase, Databricks' Postgres-compatible database for data applications. This means conversation history, intermediate results, workflow progress, and agent state are stored in a governed, queryable, auditable database — not in ephemeral in-memory state that disappears when the session ends. An agent can resume work from a previous session with full context. A supervisor agent can hand off a task to a new instance of a sub-agent with all relevant state intact. Everything stored in Lakebase is subject to the same Unity Catalog governance policies as all other data in the environment.

Lakebase · Postgres · State Persistence · Session Memory

Post-Execution Evaluation via CLEARS Framework
After the agent completes execution and returns a result, the CLEARS evaluation framework runs a structured assessment of the output quality. CLEARS — which stands for Correctness, Latency, Execution, Adherence, Relevance, and Safety — is a standardized evaluation framework built into MLflow that produces scores across all six dimensions for every agent session. These scores are logged to Unity Catalog alongside the session record. They are available for real-time monitoring dashboards, for regression detection, for identifying when agent quality degrades after a model update or a data schema change, and for generating the evidence that governance and compliance teams need to demonstrate that production agents are performing within acceptable quality bounds. The full CLEARS framework is detailed in Section 5.

CLEARS Framework · MLflow · Unity Catalog · Continuous Evaluation

What the process diagram above makes visible is something important for governance teams: governance enforcement in Agent Bricks is not a layer that sits around the agent. It is woven through every step of execution. Identity resolution happens before anything starts. Guardrails fire before tool calls execute. Access controls enforce at the data layer, not the application layer. Audit logging happens at every step. Evaluation runs after completion. There is no single point where governance can be bypassed because governance is architecture.

AI Gateway

AI Gateway is the component of Agent Bricks that deserves the most detailed attention from governance and security teams, because it is the layer that makes unified governance operationally real rather than theoretically described. Everything about Agent Bricks' multi-AI flexibility, its MCP integration, its coding agent support, and its guardrail enforcement routes through AI Gateway.

Databricks describes AI Gateway as "a unified layer to manage and govern access to models, coding agents, and now MCP-connected tools. It enforces identity, permissions, and observability across every interaction, so agents operate securely across your models, tools, and APIs." The specific phrase "unified layer" is doing real architectural work here. It means that an administrator configuring governance policies in AI Gateway does not configure them separately for model access, separately for coding agent access, and separately for MCP tool access. One policy definition propagates across all three. One audit log captures all three. One cost budget applies across all three.

How AI Gateway Routes Model Calls

When a developer writes agent code that calls a model, they write a standard API call to AI Gateway's endpoint — not directly to OpenAI's API, not directly to Anthropic's API, not directly to Google's API. AI Gateway receives that call and applies several operations before it reaches any model provider:

First, identity verification. The caller's identity is confirmed and their permissions are checked against the policies for this model family and use case. A developer without access to GPT-4o cannot invoke it, regardless of what model they specify in their API call.

Second, guardrail inspection. The prompt is evaluated against the configured guardrails — PII detection, prompt injection detection, data exfiltration patterns, unsafe content. If a guardrail fires, the call is blocked before reaching the model. The block is logged.

Third, routing logic. AI Gateway applies the routing configuration — which model to use for this call, what the fallback sequence is if that model is unavailable, what the cost optimization settings are for this organizational unit. The call is dispatched to the appropriate model.

Fourth, response inspection. The model's response passes back through AI Gateway, where a second guardrail pass inspects the output for the same categories. A model can generate a problematic response even to a clean prompt — the output guardrail catches this before the response reaches the application.

Fifth, logging. The full interaction — request, routing decision, model selected, guardrails evaluated, response generated — is logged to Unity Catalog with a complete audit trail.

How AI Gateway Governs MCP Connections

The MCP governance capability in AI Gateway is particularly significant given the security risks GAIG has documented in depth. An MCP server is a protocol layer that gives an AI agent access to external tools and data sources. When an agent connects to GitHub through an MCP server, it can read repositories, write code, open pull requests, and query code history. When it connects to Atlassian through an MCP server, it can read Jira tickets, update project status, and access Confluence documentation. These are powerful capabilities and, without governance, they represent exactly the kind of privileged access accumulation that creates serious security exposure.

AI Gateway's Managed OAuth MCP Connectors address this in two ways. First, credentials for external services are stored centrally in Databricks — not on developer machines, not in code repositories, not in environment variables that persist across sessions. An agent connecting to GitHub never touches a raw API token. It receives a governed OAuth credential that is scoped to the specific permissions required for the current session, managed by AI Gateway, and revocable from a single location.

Second, all MCP tool calls route through AI Gateway and are subject to the same guardrails, identity enforcement, and audit logging as model calls. When an agent calls a GitHub MCP tool to read a repository, that call is logged with the user's identity, the specific repository accessed, and the timestamp. Governance teams can see exactly what external systems every agent touched, when, and under whose identity.

AI Security Controls Explained: What They Are, How They Work, and How to Evaluate AI Security Platforms

How AI Gateway Supports Coding Agents

One of the significant expansions in the April 2026 Agent Bricks launch is the integration of coding agent governance into AI Gateway. Cursor, Codex, and Claude Code — three of the most widely used coding agent tools in enterprise engineering teams — can now be connected to AI Gateway as governed clients. This means that when a developer uses Cursor to generate code inside their development environment, that code generation request routes through AI Gateway. The same identity enforcement, cost budget, guardrails, and audit logging that apply to agent framework calls apply to coding agent calls.

The practical implication is the direct solution to coding agent sprawl. Instead of every developer's coding agent hitting model APIs directly with individual API keys, all coding agent traffic routes through AI Gateway with organizational credentials. Administrators see all coding agent spend in one place. They apply one budget. They enforce one set of guardrails. They produce one audit trail. A developer using Cursor and another developer using Claude Code are drawing from the same cost pool, subject to the same policies, generating the same kind of audit evidence.

Databricks Is Introducing Agent Bricks For Autonomous AI Agents — Original GAIG Coverage

The CLEARS Framework

The CLEARS framework is one of the most governance-relevant technical details in the entire Agent Bricks announcement and it has received almost no coverage compared to the identity and routing features. For compliance teams specifically, CLEARS is the mechanism that generates the evidence needed to demonstrate that production agents are operating within acceptable quality bounds — which is a specific requirement under the EU AI Act's post-market monitoring obligations.

CLEARS is an acronym for six evaluation dimensions. Each dimension measures a distinct aspect of agent quality. All six run as automated evaluations through MLflow after every agent session and produce scores that are logged to Unity Catalog. Here is what each dimension measures and why it matters:

Letter	Dimension	What It Measures	Why It Matters for Governance
C	Correctness	Did the agent produce a factually accurate answer? Does the output match the ground truth for this type of query? Correctness evaluation compares the agent's output against known-correct answers for representative queries in the deployment domain.	The most fundamental quality metric. Without measuring correctness continuously, degradation goes undetected until a user or regulator surfaces it. Post-market monitoring requirements under EU AI Act Article 72 effectively require this.
L	Latency	How long did the agent take to complete the task? What was the time from request to final output? Latency is measured at the session level and at the individual step level to identify where execution time is being consumed.	Latency degradation is often the first signal that something has changed in the underlying system — a model update, a data schema change, a new tool call that is slower than expected. It is a leading indicator for deeper quality issues.
E	Execution	Did the agent follow the intended execution path? Did it select the right tools in the right order? Did it complete all steps in the plan without skipping or repeating steps? Execution evaluation inspects the action sequence against the expected workflow structure.	Execution failures are often invisible in output-only evaluation. An agent can produce a plausible answer while following the wrong execution path — skipping a validation step, calling a tool out of sequence, or terminating early. Execution scoring catches this.
A	Adherence	Did the agent follow its system prompt and operational instructions? Did it stay within the scope it was configured for? Did it respect the constraints that were defined for this deployment? Adherence measures compliance with the agent's own design specification.	Adherence failures reveal prompt drift — cases where the agent's behavior has moved away from what was specified and approved. This is directly relevant to change management processes and to any governance framework that requires agents to operate within defined behavioral boundaries.
R	Relevance	Was the agent's output relevant to what was actually asked? Did it address the user's actual intent, or did it answer a related but different question? Relevance evaluation checks whether the output is topically and contextually aligned with the request.	Relevance failures are the production manifestation of the problem Tobias Leong at Axium Industries described when he noted that models "sometimes answer the question that you wish to ask, but not the one you actually asked." Continuous relevance scoring surfaces when this happens systematically.
S	Safety	Did the agent output contain harmful content, PII, unsafe recommendations, or policy-violating material? Safety evaluation runs the output through the same guardrail categories as the input — checking for content that should not be delivered to users regardless of whether the request seemed benign.	Safety scoring produces the evidence that an agent's outputs are continuously monitored for harmful content — which is a specific demonstrable requirement for high-risk AI systems under the EU AI Act and for responsible AI commitments under most enterprise governance frameworks.

The reason CLEARS matters beyond its individual dimensions is that it produces a standardized, comparable record of agent quality across sessions, over time, and across different agents in the same environment. An organization that runs CLEARS evaluations continuously can produce a chart showing that their customer service agent's correctness score has been consistently above 94% for the past three months. They can produce a chart showing that latency spiked in the two weeks after a model update and then returned to baseline after a threshold recalibration. They can show that adherence scores are higher for the agents that were fine-tuned on company-specific data than for the ones using the general-purpose model. That is the kind of documented evidence that audit-ready AI governance programs produce. Without continuous evaluation infrastructure like CLEARS, organizations are producing the documentation of what they intended agents to do — not the evidence of what agents actually did.

Your AI Monitoring Dashboard Is Full of Data Nobody Acts On — Why CLEARS Scores Need Owners

What Is Generally Available

The April 2026 Agent Bricks launch included a set of specific capabilities reaching general availability or launching for the first time. Each of these matters specifically for governance and compliance teams, and each represents a distinct gap that now has a supported solution inside the Agent Bricks platform.

Custom Agents on Apps
Build and deploy agent applications with any model or framework, with full lifecycle support and serverless compute. Native integration with Lakebase provides memory, conversation history, and persistent state for long-running workflows. This is the capability that allows organizations to ship agentic applications to end users without managing infrastructure — the compute, the scaling, and the state management are handled by the platform.
Supervisor Agent
Orchestrate multiple agents and tools into a single governed workflow. Define the task and connect your systems. The Supervisor coordinates execution across models, tools, and sub-agents without requiring the application developer to manually manage the handoff logic between agents. Unified governance applies across the entire agent chain — not per-agent governance that creates inconsistency at handoff points.
Document Intelligence
Extract and structure data from unstructured documents — contracts, invoices, reports, PDFs, scanned documents — turning them into queryable, governed knowledge without building custom extraction pipelines. This is available as a native SQL function (ai_parse_document) that integrates directly with the data lakehouse. Extracted content is subject to Unity Catalog governance policies the moment it lands.
Knowledge Assistant
Automatically ingest enterprise documents and make them accessible to any agent with retrieval that incorporates system context, metadata, and user permission constraints. Unlike a general document search, Knowledge Assistant retrieval is aware of who is asking, what they are authorized to see, and what business context metadata is available to make the retrieval more accurate.
Agent Mode in Genie Spaces
Move from single-turn question-and-answer to multi-step reasoning and analysis over structured data. Genie can now plan and execute multi-step analytical workflows — explore the data, identify what additional context is needed, formulate and execute queries, synthesize results — rather than answering one question at a time. This transforms Genie from a data Q&A tool into an autonomous analytical agent operating over business data with semantic grounding.
AI Gateway with Guardrails
The unified model, coding agent, and MCP governance layer with expanded guardrail capabilities covering PII exposure, unsafe content, prompt injection detection, data exfiltration, and hallucination risk. Guardrails are configurable per use case — a customer service agent and an internal data analysis agent can have different guardrail configurations appropriate to their specific risk profiles. All guardrail events are logged to Unity Catalog.
Managed OAuth MCP Connectors
Securely connect GitHub, Atlassian, Glean, and other external services as governed tools accessible to agents. Credentials managed centrally — no raw API keys in agent code or developer environments. All external tool calls route through AI Gateway and are auditable in Unity Catalog. Organizations can revoke a connector's access from a single location and all agents using that connector immediately lose access.
Web Search in Foundation Model API
Ground agent responses with real-time information from the web using native provider search capabilities. This allows agents to incorporate current information into their responses without requiring a separate external search integration. The search capability operates through Foundation Model API — meaning it is subject to the same routing, guardrail, and audit infrastructure as all other model calls.

What Enterprise Customers Are Actually Saying

The Agent Bricks launch post includes multiple quotes from enterprise customers running agents in production. These quotes are worth reading carefully because they reveal what the production experience actually looks like, which governance capabilities customers found most valuable, and what problems they were trying to solve before Agent Bricks existed. They also represent publicly quotable evidence that real organizations with real operational requirements found this architecture sufficient for production use.

"With Agent Bricks, we're not building one-off AI projects, we're building an enterprise AI fabric. Interoperability, identity-first security and governance were designed from day one, so our agents behave like any other mission-critical system, not a science experiment."
William Acosta
Head of Agentic AI Engineering, EchoStar
via Agent Bricks launch post

The phrase "enterprise AI fabric" is significant. Acosta is describing something architectural, not descriptive. An AI fabric is an infrastructure layer that connects different AI capabilities together in a way that is observable, governed, and consistent across the organization. The contrast he draws — "any other mission-critical system, not a science experiment" — names the exact production gap that kills most enterprise agent deployments. Science experiments do not have change management processes. They do not have defined incident response. They do not have audit trails. They do not have governance accountability structures. Mission-critical systems do. Agent Bricks, as Acosta describes it, brings the infrastructure discipline of mission-critical systems to agentic AI.

"Agent Bricks gives us a structured way to coordinate multiple data intelligence endpoints in a single system. Instead of hard-coding routing logic, we can guide how the agent prioritizes Genie and governed data in Unity Catalog through clear instructions. That makes it much easier to build an internal 'ask data' experience that's flexible and reliable as it evolves."
Alvaro Martin
Sr. Data Engineer, Zapier
via Agent Bricks launch post

Martin's description of the problem that Agent Bricks solves for Zapier is specific and technically accurate: without a platform, routing an agent between different data intelligence endpoints — a structured query engine like Genie, a document retrieval system, an API connector — requires hard-coded routing logic that becomes a maintenance burden as the data environment changes. Agent Bricks provides a governed, instruction-driven routing layer that can be updated through configuration rather than code. When the organization's data environment evolves, the routing adapts through instruction changes rather than requiring engineering intervention. This is the kind of operational flexibility that makes the difference between an agent that works reliably at scale and one that requires constant maintenance to keep functioning.

The production deployments Databricks cites across the blog post span financial services, retail, healthcare, and technology. Organizations including Workday, Virgin Atlantic, Zapier, EchoStar, and AstraZeneca are running production agents on Agent Bricks. These are not pilot deployments. They are production systems handling real business workflows at scale. The specific use cases described include continuous market analysis for hundreds of analysts simultaneously, supply chain and procurement workflow orchestration, automated employee service resolution, and marketing anomaly detection and correction. Each of these represents a domain where incorrect agent outputs have real business consequences — which means each of them requires the kind of governance architecture that Agent Bricks provides.

What Agent Bricks Does Not Solve

This product is quite impressive, however its not a God tool; it has its limitations. The Agent Bricks architecture is technically impressive and the governance capabilities are genuinely differentiated. Being honest about what the platform does not address is important for organizations making stack decisions based on this analysis.

Limitation 1: Ecosystem Dependency
Agent Bricks is strongest when your data, your models, and your agents all live in Databricks. The on-behalf-of token passing relies on Unity Catalog identity. The business context accuracy relies on Unity Catalog metadata. The unified audit trail relies on Unity Catalog logging. Organizations with significant data assets outside the Databricks ecosystem — in Snowflake, in AWS native services, in Azure Data Factory, in on-premises systems that are not connected to the lakehouse — get a partial version of the governance model. The closer your data infrastructure is to being fully on Databricks, the more complete the governance coverage. Organizations starting fresh on data infrastructure get the full picture. Organizations with established heterogeneous data stacks are making a larger architectural commitment than the product positioning makes immediately obvious.

Limitation 2: The Organizational Accountability Gap
Agent Bricks provides the technology layer for governed agentic AI. It does not provide the human accountability layer that makes any governance program actually function. CLEARS scores flowing into Unity Catalog are only valuable if someone owns those scores — if there is a named individual responsible for reviewing them, a defined response procedure for when scores degrade, and an escalation path for when an agent's quality falls below the acceptable threshold. The signal-to-incident collapse problem that GAIG has documented extensively applies here. An agent with a degrading adherence score and a dashboard that nobody checks daily is not a governed agent. It is an agent with expensive monitoring infrastructure that nobody is reading.

Limitation 3: Pre-Deployment Governance
Agent Bricks is optimized for governing agents in production. The pre-deployment governance work — risk classification, scope definition, stakeholder approval, third-party vendor assessment for external tools, security review of the agent's intended capabilities — is not what this platform is built to handle. Organizations need a governance program that covers the full agent lifecycle. Agent Bricks covers the production runtime end exceptionally well. The classification and policy definition work that should happen before any agent reaches production requires separate governance program infrastructure that connects to Agent Bricks at deployment time.

The Complete AI Governance Vendor Interview Guide: Every Question to Ask Before You Buy

The Full Architecture

To bring everything in this analysis together, here is a complete architectural view of how the components of Agent Bricks connect to each other and what each layer is responsible for.

USER ENTRY LAYER

Databricks Apps
Genie Spaces
Custom Agent Apps
Cursor / Codex / Claude Code
LangGraph / OpenAI Agents SDK

AI GATEWAY — GOVERNANCE CONTROL PLANE

Identity Enforcement
Input Guardrails
Output Guardrails
Cost Budget Enforcement
Model Routing + Fallback
MCP Tool Access Control
Audit Event Emission

EXECUTION LAYER

Supervisor Agent
Foundation Model API
Managed OAuth MCP Connectors
Web Search
Lakebase (State + Memory)

CONTEXT + DATA LAYER

Unity Catalog Metadata
Genie Semantic Layer
Knowledge Assistant
Document Intelligence
Data Lakehouse

EVALUATION + OBSERVABILITY LAYER

CLEARS Framework (MLflow)
Chain-of-Thought Traceability
Unity Catalog Audit Log
Arize AI Observability
Performance + Drift Monitoring

The architecture diagram above reveals the most important structural property of Agent Bricks: governance is not a wrapper around the outside of the system. It is embedded at every layer. The governance control plane sits between the user entry layer and every execution resource. Context retrieval operates under the same identity constraints as data access. Every layer produces audit events that flow into Unity Catalog. CLEARS evaluation runs after execution and produces the compliance evidence that the audit layer needs. This is the difference between governance as a feature and governance as an architecture.

"Agent Bricks is the first enterprise agent platform I have seen that treats governance as load-bearing infrastructure rather than a compliance checkbox. On-behalf-of token passing, chain-of-thought traceability, CLEARS evaluation, unified AI Gateway — these are not features you can add later. They are decisions made in the architecture that determine whether the platform can actually be governed when something goes wrong. That is the right design philosophy. The organizations that build on this architecture will have the evidence base to scale agentic AI responsibly. The ones that build on platforms without this foundation will be reconstructing evidence from incomplete logs when the first incident surfaces."
Nathaniel Niyazov
CEO, GetAIGovernance.net

Our Take

AI Governance Take

The Agent Bricks launch represents the most complete public answer to a question that every enterprise deploying AI agents in production eventually has to answer: who governs the agent? Not who approved its deployment. Not who gets paged when it fails. Who is responsible, in real time, for what the agent can do, what it can access, what it can output, and what evidence exists of everything it did? Agent Bricks answers all four of those questions with specific, demonstrated architecture — and that is genuinely rare in a market full of governance claims backed by documentation rather than implementation.

The on-behalf-of token passing architecture alone is worth significant attention from any CISO evaluating agentic AI platforms. The most persistent and dangerous governance failure in agentic deployments is Permission Creep Drift — agents accumulating access over time that nobody reviewed as a cumulative profile. On-behalf-of token passing eliminates this structurally. The agent never accumulates permissions. Every session is bounded by the triggering user's current authorized scope. That is a governance property, not a monitoring alert waiting to fire.

The CLEARS framework is the governance-relevant detail that the broader market is underreporting. For organizations subject to EU AI Act Article 72 post-market monitoring requirements, CLEARS provides the automated, continuous, standardized quality evidence that the regulation requires organizations to collect, document, and analyse. Organizations that deploy production agents without continuous evaluation infrastructure will discover, at regulatory examination time, that their audit documentation shows what they intended agents to do — not what agents actually did. Those are not the same thing and regulators are learning to ask for both.

The ecosystem dependency is real and should not be minimized. Agent Bricks is the right architecture for organizations building on Databricks. For organizations with significant data assets outside the Databricks ecosystem, the governance completeness degrades proportionally to how much data lives elsewhere. That is not a reason to avoid Agent Bricks — it is a reason to evaluate it clearly against your actual data infrastructure before committing to the architecture.

The broader signal from this launch is strategic. Databricks is making a clear platform bet: that the future of enterprise AI is agents grounded in business data, governed at the infrastructure layer, observable at the session level, and evaluated continuously against standardized quality frameworks. That bet is correct. The organizations that build on that architecture will govern agents as first-class enterprise systems. The ones that build on frameworks without it will govern agents as science experiments — until the experiments have consequences significant enough to demand something better.

GetAIGovernance

Back to All Articles

AI Governance

AI Security

AI Monitoring

AI Compliance

Need help choosing?