Updates | SafeWork-F

F1.5 Current February 8, 2026

Updated risk analysis framework evaluating frontier models across five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R&D, and self-replication. Introduces PACEbench for realistic cyber attack evaluation, the RvB adversarial defense framework, and a training pipeline for mitigating persuasion risks. Evaluates models from OpenAI, Google DeepMind, Anthropic, and others.

View blog

DeepSight: from evaluation to diagnosis GitHub November 2025

Unified evaluation–diagnosis pipeline combining DeepSafe (all-in-one safety evaluation toolkit for LLMs and MLLMs: 25+ datasets, ProGuard model; benchmarks used in SafeWork-F) and DeepScan (diagnostic framework with Register → Configure → Execute → Summarize workflow). Use together for full evaluation and diagnosis.

DeepSafe DeepScan

F1.0 Initial release July 2025

First release of SafeWork-F: Frontier Risk Management Framework — structured risk framework, front-risk definitions, and safety evaluation methodology.