SafeWork-F: Frontier AI Risk Management Framework

What We Do

We develop frameworks and evaluations to understand front-risk—risks that appear in the primary, user-facing behavior of AI systems—and to guide safer deployment and continuous monitoring.

Approach

Taxonomy

We define four risk domains and construct evaluation in the following seven risk respects.

Four risk domains

Misuse risks Threat source: External malicious actors

Risks arising from intentional exploitation of AI model capabilities by malicious actors to cause harm to individuals, organisations, or society.

Loss of control risks Threat source: Model control-undermining propensity

Risks associated with scenarios in which one or more general-purpose AI systems come to operate outside of anyone's control, with no clear path to regaining control—including both passive loss of control (gradual reduction in human oversight) and active loss of control (AI systems actively undermining human control).

Accident risks Threat source: Human operational error or model misjudgment

Risks arising from operational failures, model misjudgments, or improper human operation of AI systems deployed in safety-critical infrastructure, where single points of failure can trigger cascading catastrophic consequences.

Systemic risks Threat source: Tech–institutional misalignment

Risks emerging from widespread deployment of general-purpose AI beyond the risks directly posed by individual model capabilities, arising from mismatches between AI technology and existing social, economic, and institutional frameworks.

Seven risk respects (evaluation dimensions)

Cyber offense — Capture-the-flag (CTF) and autonomous cyber attack
Biological and chemical — Hazardous knowledge and reasoning; protocol diagnosis and troubleshooting
Persuasion and manipulation — Inducing shifts in human or model opinions through dialogue
Scheming — Dishonesty under pressure and sandbagging
Uncontrolled AI R&D — AI research and development outside intended control
Self-replication — Capability and propensity for self-replication
Multi-agent fraud — Collusion and fraud in social systems

Key pillars

Structured risk framework

Severity and likelihood scales, clear categories, and front-risk definitions so teams can assess and prioritize consistently.
Evidence-based evaluation

Red-teaming, benchmarks, and quantitative metrics to measure safety and alignment under adversarial and edge-case conditions.
Ongoing monitoring

Recommendations for periodic re-evaluation, versioned assessments, and tracking of high-leverage risk dimensions over time.

Publications

We originally published the Frontier AI Risk Management Framework and the F1.0 practice technical report—defining our risk taxonomy, evaluation methodology, and front-risk assessment approach. We continue to publish safety reports (such as the F1.5 report) and supporting materials as we monitor the latest frontier models and emerging risks.

View all publications

Updates

Version history. New releases will be listed here.

F1.5 Current February 8, 2026

Incremental update: safety report with evaluation of more recent models and benchmarks. Builds on F1.0. Includes methodology, front-risk assessment, LLM evaluation table and charts (radar, scatter), and recommendations.
View report
DeepSight: from evaluation to diagnosis GitHub November 2025

Unified evaluation–diagnosis pipeline combining DeepSafe (all-in-one safety evaluation toolkit for LLMs and MLLMs: 25+ datasets, ProGuard model; benchmarks used in SafeWork-F) and DeepScan (diagnostic framework with Register → Configure → Execute → Summarize workflow). Use together for full evaluation and diagnosis.
DeepSafe DeepScan

View all updates

Frontier AI Risk Management Framework