F1.5 We released F1.5, an incremental update from F1.0
Frontier AI Risk Management Framework
a structured approach to identifying, assessing, and mitigating risks at the frontier of AI systems.
What We Do
We develop frameworks and evaluations to understand front-risk—risks that appear in the primary, user-facing behavior of AI systems—and to guide safer deployment and continuous monitoring.
Approach
Taxonomy
We define four risk domains and construct evaluation in the following seven risk respects.
Four risk domains
Risks arising from intentional exploitation of AI model capabilities by malicious actors to cause harm to individuals, organisations, or society.
Risks associated with scenarios in which one or more general-purpose AI systems come to operate outside of anyone's control, with no clear path to regaining control—including both passive loss of control (gradual reduction in human oversight) and active loss of control (AI systems actively undermining human control).
Risks arising from operational failures, model misjudgments, or improper human operation of AI systems deployed in safety-critical infrastructure, where single points of failure can trigger cascading catastrophic consequences.
Risks emerging from widespread deployment of general-purpose AI beyond the risks directly posed by individual model capabilities, arising from mismatches between AI technology and existing social, economic, and institutional frameworks.
Seven risk respects (evaluation dimensions)
- Cyber offense — Capture-the-flag (CTF) and autonomous cyber attack
- Biological and chemical — Hazardous knowledge and reasoning; protocol diagnosis and troubleshooting
- Persuasion and manipulation — Inducing shifts in human or model opinions through dialogue
- Scheming — Dishonesty under pressure and sandbagging
- Uncontrolled AI R&D — AI research and development outside intended control
- Self-replication — Capability and propensity for self-replication
- Multi-agent fraud — Collusion and fraud in social systems
Key pillars
-
Structured risk framework
Severity and likelihood scales, clear categories, and front-risk definitions so teams can assess and prioritize consistently.
-
Evidence-based evaluation
Red-teaming, benchmarks, and quantitative metrics to measure safety and alignment under adversarial and edge-case conditions.
-
Ongoing monitoring
Recommendations for periodic re-evaluation, versioned assessments, and tracking of high-leverage risk dimensions over time.
Publications
We originally published the Frontier AI Risk Management Framework and the F1.0 practice technical report—defining our risk taxonomy, evaluation methodology, and front-risk assessment approach. We continue to publish safety reports (such as the F1.5 report) and supporting materials as we monitor the latest frontier models and emerging risks.
Updates
Version history. New releases will be listed here.
-
Incremental update: safety report with evaluation of more recent models and benchmarks. Builds on F1.0. Includes methodology, front-risk assessment, LLM evaluation table and charts (radar, scatter), and recommendations.
View report -
Unified evaluation–diagnosis pipeline combining DeepSafe (all-in-one safety evaluation toolkit for LLMs and MLLMs: 25+ datasets, ProGuard model; benchmarks used in SafeWork-F) and DeepScan (diagnostic framework with Register → Configure → Execute → Summarize workflow). Use together for full evaluation and diagnosis.
DeepSafe DeepScan