Updates
Version history. New releases and updates are listed here.
-
Updated risk analysis framework evaluating frontier models across five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R&D, and self-replication. Introduces PACEbench for realistic cyber attack evaluation, the RvB adversarial defense framework, and a training pipeline for mitigating persuasion risks. Evaluates models from OpenAI, Google DeepMind, Anthropic, and others.
View blog -
Unified evaluation–diagnosis pipeline combining DeepSafe (all-in-one safety evaluation toolkit for LLMs and MLLMs: 25+ datasets, ProGuard model; benchmarks used in SafeWork-F) and DeepScan (diagnostic framework with Register → Configure → Execute → Summarize workflow). Use together for full evaluation and diagnosis.
DeepSafe DeepScan -
First release of SafeWork-F: Frontier Risk Management Framework — structured risk framework, front-risk definitions, and safety evaluation methodology.