Updates
Version history. New releases and updates are listed here.
-
Incremental update: safety report with evaluation of more recent models and benchmarks. Builds on F1.0. Includes methodology, front-risk assessment, LLM evaluation table and charts (radar, scatter), and recommendations.
View report -
Unified evaluation–diagnosis pipeline combining DeepSafe (all-in-one safety evaluation toolkit for LLMs and MLLMs: 25+ datasets, ProGuard model; benchmarks used in SafeWork-F) and DeepScan (diagnostic framework with Register → Configure → Execute → Summarize workflow). Use together for full evaluation and diagnosis.
DeepSafe DeepScan -
First release of SafeWork-F: Frontier Risk Management Framework — structured risk framework, front-risk definitions, and safety evaluation methodology.