Model Observability

SentinelOne Built an AI System That Makes Its Own Models Argue With Each Other Before Trusting Any of Them

SentinelOne Labs introduced an adversarial consensus engine that runs multiple AI agents against the same malware sample and forces them to challenge each other’s findings before anything reaches a final report. The architecture directly addresses a problem most AI security teams are quietly dealing with, models reasoning confidently over broken data, with implications that extend far beyond malware analysis.

Updated on March 20, 2026
SentinelOne Built an AI System That Makes Its Own Models Argue With Each Other Before Trusting Any of Them

You paste raw decompiler output into a model, ask it what the malware does, and almost instantly you get something back that feels finished, structured, and oddly convincing, like it has already been reviewed and signed off by someone who knows exactly what they are doing.

What sits underneath that answer is where things start to drift. Virtual addresses look clean but are slightly off. Capabilities appear that never actually execute, pulled quietly from standard library code that should have been ignored. A command string comes back with one character wrong because a parsing step misread a delimiter. None of this looks chaotic, which is exactly the problem, because the system is reasoning carefully over inputs that were already compromised before the model even touched them.

SentinelOne Labs looked at that pattern and made a decision that most teams hesitate to make, they stopped trying to improve the response and instead forced the system itself to question what it produces.


What They Actually Built

The system runs as a sequence of AI agents, each paired with a different reverse engineering tool, including radare2, Ghidra, Binary Ninja, and IDA Pro, and each one is required to engage with what came before it rather than simply extending it.

Every agent receives the full output from the previous stage and has to either validate it or push back against it before adding anything new. That interaction changes the flow entirely, because incorrect findings do not quietly pass through, they get surfaced, questioned, and in many cases removed before they can compound into something larger.

The pipeline runs twice, which is where it becomes more deliberate. The first pass gathers findings in sequence. The second pass, referred to internally as the Gauntlet, reshuffles the order and pushes each agent to review the same material again with a much more critical posture, actively looking for what does not hold up rather than what might still fit.

Only the findings that make it through both passes are written into the final report, and every claim is tied back to a specific virtual address with supporting decompilation context, so the output carries its own trace of how it was formed.

That sequencing matters more than it first appears, because each agent is not only seeing what was found, it is also seeing what was rejected, and that builds a chain of evidence that accumulates over time instead of a set of disconnected opinions that never meet each other.


What the Mechanism Caught

One of the clearest examples comes from something that would usually pass unnoticed. A command-and-control endpoint was first extracted as /api/req_res, where an underscore quietly replaced a forward slash. A later pass corrected it to /api/req/res, which is what actually existed in the binary.

Without a system that forces comparison and rejection, that incorrect endpoint would have carried through as a confirmed finding, and from there it becomes an operational issue. A team begins investigating infrastructure that does not exist, time moves, attention shifts, and the real endpoint continues to operate untouched.

In another case, a decompiler artifact introduced a claim about a download instruction that never existed in the binary. It looked legitimate at first glance, detailed enough to pass a quick review, but when the agents were forced to revisit it in sequence, the inconsistency surfaced and the claim was removed before it reached the report.

What starts to become clear, slowly and then all at once, is that the system is not failing because it cannot reason. It is failing because it is reasoning carefully over inputs that were never fully reliable to begin with.


The Decisions That Makes This Work

SentinelOne made a choice that sits quietly in the background but drives most of this behavior. Instead of relying on the Model Context Protocol, they used deterministic bridge scripts to extract data in full.

MCP works well when a human is guiding the interaction, because the model can ask for what it needs step by step, but in an automated pipeline that same flexibility becomes a liability. The model decides what to request, which means it can miss something simply by not asking for it.

A deterministic bridge script removes that uncertainty. It pulls everything in one pass, regardless of what the model might prioritize, and ensures that every agent is working from the same complete dataset each time the system runs.

That difference changes more than performance. It changes how much trust you can place in the process, because the system is no longer dependent on what the model chooses to look at in a given moment.

Our Take

AI Security Take

What SentinelOne built starts to look different when you step back from the tooling and focus on how the system behaves under pressure. The requirement for agents to challenge each other is not just a design choice, it functions as a control that prevents weak claims from quietly becoming accepted ones. The shared context between stages creates a record of how decisions were formed, and the second pass introduces a structured way to revisit and question those decisions before anything is finalized.

These are the same capabilities that organizations are trying to introduce into broader AI deployments, only here they are embedded directly into the workflow rather than layered on afterward. That shift matters, because the system is operating in an environment where outputs influence real security decisions, and those decisions cannot rely on assumptions that go untested.

What this signals more broadly is already visible if you look closely enough. Autonomous systems are operating with less direct human oversight, and the ones that hold up are the ones where challenge, traceability, and review are built into the system itself from the start.

The agent layer is already in production, and the teams that account for how these systems behave before something goes wrong are the ones producing results that still make sense when someone takes a second look.

Related Articles

Arize vs Fiddler vs Arthur: Which AI Monitoring Platform Actually Fits Your Enterprise? Model Observability

Mar 1, 2026

Arize vs Fiddler vs Arthur: Which AI Monitoring Platform Actually Fits Your Enterprise?

Read More
AI Governance Platforms vs Monitoring vs Security vs Compliance Governance Platforms

Mar 1, 2026

AI Governance Platforms vs Monitoring vs Security vs Compliance

Read More
NVIDIA GTC 2026 Partnerships: CrowdStrike, Arize AI, ServiceNow, TrendAI, DataRobot, H2O.ai, and Mistral AI AI Infrastructure Security

Mar 17, 2026

NVIDIA GTC 2026 Partnerships: CrowdStrike, Arize AI, ServiceNow, TrendAI, DataRobot, H2O.ai, and Mistral AI

Read More

Stay ahead of Industry Trends with our Newsletter

Get expert insights, regulatory updates, and best practices delivered to your inbox