F1.5 We released F1.5, an incremental update from F1.0
Frontier AI Risk Management Framework
a structured approach to identifying, assessing, and mitigating risks at the frontier of AI systems.
What We Do
We build structured tools to understand and manage front-risk — risks that emerge directly from the primary, user-facing behavior of frontier AI systems — and to guide safer deployment across the full model lifecycle.
Frontier AI Risk Management Framework
The foundational document defining our risk taxonomy, severity scales, governance process, and front-risk assessment methodology.
Read documentFrontier AI Risk Management Framework in Practice
A risk analysis report applying the framework to real frontier models — covering evaluation methodology, results, and safety recommendations.
Read documentHow we work
-
Structured risk framework
Severity and likelihood scales, clear categories, and front-risk definitions so teams can assess and prioritize consistently.
-
Evidence-based evaluation
Red-teaming, benchmarks, and quantitative metrics to measure safety and alignment under adversarial and edge-case conditions.
-
Ongoing monitoring
Recommendations for periodic re-evaluation, versioned assessments, and tracking of high-leverage risk dimensions over time.
Approach
Risk Management Process
The six-stage process that structures every risk assessment — from identifying potential threats to continuously monitoring residual risks.
Taxonomy
We define four risk domains and construct evaluation in the following seven risk respects.
Four risk domains
Risks arising from intentional exploitation of AI model capabilities by malicious actors to cause harm to individuals, organisations, or society.
Risks associated with scenarios in which one or more general-purpose AI systems come to operate outside of anyone's control, with no clear path to regaining control—including both passive loss of control (gradual reduction in human oversight) and active loss of control (AI systems actively undermining human control).
Risks arising from operational failures, model misjudgments, or improper human operation of AI systems deployed in safety-critical infrastructure, where single points of failure can trigger cascading catastrophic consequences.
Risks emerging from widespread deployment of general-purpose AI beyond the risks directly posed by individual model capabilities, arising from mismatches between AI technology and existing social, economic, and institutional frameworks.
Seven risk respects (evaluation dimensions)
Updates
New releases, tools, and related work from the SafeWork-F project.
-
Incremental update: safety report with evaluation of more recent models and benchmarks. Builds on F1.0. Includes methodology, front-risk assessment, LLM evaluation table and charts (radar, scatter), and recommendations.
View blog -
Unified evaluation–diagnosis pipeline combining DeepSafe (all-in-one safety evaluation toolkit for LLMs and MLLMs: 25+ datasets, ProGuard model; benchmarks used in SafeWork-F) and DeepScan (diagnostic framework with Register → Configure → Execute → Summarize workflow). Use together for full evaluation and diagnosis.
DeepSafe DeepScan
Publications
-
Frontier AI Risk Management Framework — F1.5
Safety evaluation and front-risk assessment, methodology, findings with interactive charts and LLM evaluation table, and recommendations for deployment.
-
Frontier AI Risk Management Framework
The framework document: risk identification, thresholds, analysis, evaluation, mitigation, and governance for frontier AI.
-
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
Technical report applying the framework: risk analysis methodology and evaluation in practice.