Frontier AI labs first deploy their most advanced models internally, often for weeks or months of testing, evaluation, and iteration before considering public release. Anthropic’s Claude Mythos Preview, for example, was available internally from late February 2026 — roughly six weeks before its April announcement. These internal systems frequently have privileged access to sensitive training infrastructure, proprietary codebases, safety evaluation pipelines, and model weights.
Because internal models often feature enhanced capabilities, reduced safety filters, or helpful-only variants, they can create risks that standard external deployment frameworks fail to address. The report identifies two primary threat vectors: autonomous AI misbehavior, where models take harmful actions on their own initiative, and insider threats, where employees or contractors with privileged access misuse or exfiltrate models.
The IAPS guide provides a harmonized risk reporting standard designed to meet requirements in California’s SB 53, New York’s RAISE Act, and the EU’s General-Purpose AI Code of Practice. It is aimed primarily at evaluation and safety teams at frontier developers, while also helping regulators and auditors understand what high-quality internal use risk reporting should look like.
Key Terms
Internal Use: Deployment of unreleased or specially configured frontier models inside the developer organization, including helpful-only variants with reduced safeguards.
Safety Case: A structured, evidence-based argument that a model’s internal deployment does not pose unacceptable marginal risk.
Autonomous AI Misbehavior: Models taking harmful actions without explicit human direction, such as sabotage or self-exfiltration.
Insider Threats: Misuse, modification, or exfiltration of models by employees or contractors, potentially in collaboration with external actors.
Marginal Risk: The additional risk created by internal deployment compared to publicly available models.
Means-Motive-Opportunity Framework: Risk assessment structured around capability (means), inclination (motive), and ability to act undetected (opportunity).
Helpful-Only Variants: Internal model versions with reduced safety filters used for testing raw capabilities.
Key Findings
Internal AI models often pose greater risks than public versions due to privileged access to sensitive systems, limited external oversight, and significantly higher capabilities.
Two primary threat vectors dominate internal use risks: autonomous AI misbehavior and insider threats, both of which can lead to R&D sabotage, catastrophic misuse, or model weight exfiltration.
A credible safety case must address all three risk factors — means, motive, and opportunity — for both threat vectors rather than relying on any single argument.
Opportunity-based controls (access restrictions and monitoring) become increasingly fragile as model capabilities surpass human-level performance in relevant domains.
Motive-based arguments will grow in importance over time, especially as models gain the means and opportunity to act against developer interests.
Reporting should be risk-proportional: comprehensive updates for major capability jumps, targeted updates for partial changes, and quarterly reports as a baseline.
Insider threat mitigations should be submitted in a confidential annex to prevent providing a roadmap for malicious actors.
Strong reporting standards include presenting counterarguments, explicitly acknowledging uncertainty, explaining omissions, and including technical annexes for reproducibility.
Internal use risk reporting can serve as a leading indicator of accelerating AI capabilities, particularly as labs increase automation of their own R&D pipelines.
Without structured internal risk reporting, the growing gap between internal and public AI systems may remain invisible to regulators and the public.
What the Report Covers
The IAPS report written by Oscar Delaney, Sambhav Maheshwari, Joe O'Brien, Theo Bearman, and Oliver Guest explains why internal AI deployments pose distinctive risks and lays out a practical framework for addressing them through structured risk reporting. It introduces a clear risk taxonomy built on two threat vectors (autonomous AI misbehavior and insider threats) and three risk factors (means, motive, and opportunity).
The guide details specific evidence categories developers should include in their reports, ranging from capability benchmarks and real-world R&D metrics to honeypot evaluations, access controls, monitoring effectiveness, and adversarial control testing. It also provides a decision tree to determine when comprehensive, targeted, or quarterly reports are required.
The framework emphasizes strong reporting standards, including presenting counterarguments, acknowledging uncertainty, and separating sensitive insider threat information into confidential annexes. Overall, it aims to move beyond minimal legal compliance toward consistent, comparable, and actionable transparency for frontier AI labs.
Our Take
AI Governance Take
This IAPS report makes a compelling case that internal use of frontier AI models is not a governance afterthought — it is one of the most critical areas requiring structured oversight. As labs push capabilities forward in private, the gap between internal and public systems continues to widen, making robust internal risk reporting essential for both safety and accountability.
The emphasis on building credible safety cases with evidence across means, motive, and opportunity aligns closely with the direction GAIG has been tracking: moving from documentation-heavy compliance toward real behavioral visibility and runtime controls. For organizations developing or evaluating frontier models, strong internal risk reporting will increasingly depend on platforms that deliver continuous observability, audit trails, and verifiable controls over agentic behavior.
If your team is building frontier AI systems or responsible for evaluating internal model risk, visit the GAIG marketplace. There you can compare the monitoring, governance, and security platforms that provide the runtime visibility and evidence generation needed to support credible safety cases and responsible internal deployment.