OpenRT - AI45 Laboratory

OpenRT offers a modular parallel runtime that decouples components and supports diverse attack strategies to systematically evaluate MLLM security.

Abstract

The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benchmarks are often fragmented, limited to single-turn text interactions, and lack the scalability required for systematic evaluation. To address this, we introduce OpenRT, a unified, modular, and high-throughput red-teaming framework designed for comprehensive MLLM safety evaluation. At its core, OpenRT architects a paradigm shift in automated red-teaming by introducing an adversarial kernel that enables modular separation across five critical dimensions: model integration, dataset management, attack strategies, judging methods, and evaluation metrics. By standardizing attack interfaces, it decouples adversarial logic from a high-throughput asynchronous runtime, enabling systematic scaling across diverse models. Our framework integrates 37 diverse attack methodologies, spanning white-box gradients, multi-modal perturbations, and sophisticated multi-agent evolutionary strategies. Through an extensive empirical study on 20 advanced models (including GPT-5.2, Claude 4.5, and Gemini 3 Pro), we expose critical safety gaps: even frontier models fail to generalize across attack paradigms, with leading models exhibiting average Attack Success Rates as high as 49.14%. Notably, our findings reveal that reasoning models do not inherently possess superior robustness against complex, multi-turn jailbreaks. By open-sourcing OpenRT, we provide a sustainable, extensible, and continuously maintained infrastructure that accelerates the development and standardization of AI safety.

🧩 Framework Innovation

A Modular, Extensible, and Composable Framework for Red Teaming. OpenRT decoupled red-teaming into five orthogonal dimensions: models, datasets, attacks, judges, and metrics. Powered by a unified Component Registry, it enables seamless plug-and-play composition, spanning from simple prompts to complex multi-turn and multi-agent strategies. This standardized design decouples complex search and optimization strategies into infrastructure, enabling developers to reuse underlying search capabilities without redundant implementation.

🚀 Extreme Efficiency

30x Acceleration with One-Line Command. Designed for large-scale evaluation, OpenRT leverages a high-concurrency architecture based on AsyncIO and ThreadPool. By achieving dual parallelism in inference and scheduling, it boosts throughput by 30x compared to serial baselines. With just one line of command, users can execute a high-throughput scan covering the entire pipeline—from attack generation and auto-judgment to safety reporting.

🛡️ Most complete and Up-to-date

37+ SOTA Attack Paradigms (Continuously Updated). OpenRT features the industry's most extensive algorithm library, integrating 37+ state-of-the-art attack methods and continuously tracking the latest research advancements. Our arsenal covers the full threat spectrum, including multimodal attacks, multi-agent coordination, logic obfuscation, and iterative optimization. More than just a toolbox, OpenRT serves as a standardized infrastructure for the safety acceptance of next-generation frontier models.

Attack Methods

Method	Year	Multi-Modal	Multi-Turn	Multi-Agent	Strategy Paradigm
White-Box
GCG	2023	Text	Single	No	Gradient Optimization
Visual Jailbreak	2023	Image	Single	No	Gradient Optimization
Black-Box: Optimization & Fuzzing
AutoDAN	2023	Text	Single	No	Genetic Algorithm
GPTFuzzer	2023	Text	Single	No	Fuzzing / Mutation
TreeAttack	2023	Text	Single	No	Tree-Search Optimization
SeqAR	2024	Text	Single	No	Genetic Algorithm
RACE	2025	Text	Multi	No	Gradient/Genetic Optimization
AutoDAN-R	2025	Text	Single	No	Test-Time Scaling
Black-Box: LLM-driven Refinement
PAIR	2023	Text	Single	No	Iterative LLM Optimization
ReNeLLM	2023	Text	Single	No	Rewrite & Nesting
DrAttack	2024	Text	Single	No	Prompt Decomposition
AutoDAN-Turbo	2024	Text	Single	No	Genetic + Gradient Guide
Black-Box: Linguistic & Encoding
CipherChat	2023	Text	Single	No	Cipher/Encryption
CodeAttack	2022	Text	Single	No	Code Encapsulation
Multilingual	2023	Text	Single	No	Low-Resource Language
Jailbroken	2023	Text	Single	No	Template Combination
ICA	2023	Text	Single	No	In-Context Demonstration
FlipAttack	2024	Text	Single	No	Token Flipping / Masking
Mousetrap	2025	Text	Single	No	Logic Nesting / Obfuscation
Prefill	2025	Text	Single	No	Prefix Injection
Black-Box: Contextual Deception
DeepInception	2023	Text	Single	No	Hypnosis or Nested Scene
Crescendo	2024	Text	Multi	No	Multi-turn Steering
RedQueen	2024	Text	Multi	No	Concealed Knowledge
CoA	2024	Text	Multi	No	Chain of Attack
Black-Box: Multimodal Specific
FigStep	2023	Image	Single	No	Typography / OCR
QueryRelevant	2024	Image	Single	No	Visual Prompt Injection
IDEATOR	2024	Image	Single	No	Visual Semantics
MML	2024	Image	Single	No	Cross‑Modal Encryption
HADES	2024	Image	Single	No	Visual Vulnerability Amplification
HIMRD	2024	Image	Single	No	Multi-Modal Risk Distribution
JOOD	2025	Image	Single	No	OOD Transformation
SI	2025	Image	Single	No	Shuffle Inconsistency Optimization
CS-DJ	2025	Image	Single	No	Multi‑Level Visual Distraction
Black-Box: Multi-Agent & Cooperative
ActorAttack	2024	Text	Multi	Yes	Actor-Based Steering
Rainbow Teaming	2024	Text	Multi	Yes	Diversity-Driven Search
X-Teaming	2025	Text	Multi	Yes	Cooperative Exploration
EvoSynth	2025	Text	Multi	Yes	Code-Level Evolutionary Synthesis

Experiments

Attack Performance across Different MLLMs on HarmfulBench

Attack	GPT-5.2	GPT-5.1	Claude Haiku 4.5	Gemini 3 Pro Preview	Gemini 2.5 Flash	Mistral Large 3	Llama-4 Maverick	Llama-4 Scout	Grok 4.1 Fast	Doubao Seed-1.6
AutoDAN	2.0	8.0	1.5	22.5	37.5	28.5	23.5	64.5	38.5	13.0
GPTFuzzer	11.0	1.5	0.0	51.0	93.0	97.5	64.0	97.5	31.0	57.0
TreeAttack	11.0	23.5	8.0	49.5	79.0	74.5	69.5	80.5	81.0	68.0
SeqAR	25.0	29.5	0.0	8.5	97.5	99.0	73.0	88.0	55.5	64.0
RACE	24.5	38.0	24.5	47.0	47.5	53.0	30.5	59.5	49.5	48.0
AutoDAN-R	70.5	69.0	28.5	83.0	96.5	97.0	96.5	80.0	90.0	86.5
PAIR	38.5	72.5	13.0	74.5	84.5	78.0	66.0	89.5	80.0	75.5
ReNeLLM	8.0	33.5	0.5	13.5	51.5	22.0	39.0	57.0	42.5	43.0
DrAttack	32.0	54.0	5.5	56.0	56.0	89.5	60.5	83.0	31.5	68.0
AutoDAN-Turbo	21.5	15.5	1.0	0.0	0.5	83.5	0.5	0.0	3.0	1.0
CipherChat	14.5	64.0	32.5	0.0	89.5	64.0	21.0	68.0	26.0	38.5
CodeAttack	22.0	20.5	29.5	10.5	51.0	8.5	71.0	86.5	22.0	89.0
Multilingual	16.5	25.0	0.0	2.0	34.0	55.5	14.0	0.0	1.5	6.5
Jailbroken	7.0	29.5	0.0	11.0	92.5	98.5	39.5	33.5	31.5	28.0
ICA	14.0	33.5	0.0	9.0	98.5	99.0	8.0	37.0	41.0	65.5
FlipAttack	13.5	68.5	0.0	19.5	95.5	95.5	65.5	54.5	23.0	87.0
Mousetrap	97.5	71.0	0.0	49.0	95.5	100.0	95.5	87.5	100.0	100.0
Prefill	1.0	14.0	0.0	3.5	97.5	97.0	34.5	43.5	25.5	30.5
DeepInception	15.5	19.0	0.0	3.5	84.0	100.0	82.5	94.5	37.5	82.0
Crescendo	32.5	51.0	9.0	47.0	48.0	61.0	17.0	30.5	41.0	58.0
RedQueen	0.0	1.0	0.0	2.5	3.0	4.5	3.0	5.5	1.5	21.5
CoA	15.5	0.0	0.5	2.0	4.5	16.5	3.0	19.0	7.0	4.5
FigStep	2.0	1.5	1.5	7.5	12.0	18.5	42.5	25.5	5.5	13.5
QueryRelevant	1.5	4.0	2.0	5.0	16.0	24.0	26.0	16.0	10.0	8.5
IDEATOR	31.5	73.0	17.0	80.0	95.0	94.5	90.0	94.0	94.5	96.0
MML	4.5	68.0	75.0	40.5	98.0	98.0	90.5	90.5	58.0	97.5
HADES	0.0	1.0	2.0	7.0	29.5	33.0	25.0	29.0	22.5	17.5
HIMRD	11.5	35.0	0.0	9.0	70.0	61.5	3.5	29.5	1.5	49.5
JOOD	65.0	62.5	38.0	56.0	61.5	63.0	38.5	39.5	69.5	72.0
SI	3.0	45.0	14.0	37.0	82.5	47.5	81.0	71.5	27.0	44.0
CS-DJ	15.0	21.5	23.5	35.0	39.5	38.0	35.0	39.5	28.5	51.0
ActorAttack	0.5	31.0	10.0	65.0	76.0	0.5	65.5	79.0	50.0	56.0
Rainbow Teaming	0.5	3.5	12.0	73.5	61.0	5.5	3.5	35.0	13.5	67.0
X-Teaming	75.5	95.5	47.5	86.5	89.0	91.0	86.0	98.0	90.5	87.0
EvoSynth	99.0	100.0	74.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0

Attack Performance across Different LLMs for HarmfulBench Evaluation

Attack	Qwen3-Max	Qwen3-235B A22B	Qwen3-Next 80B-A3B	DeepSeek R1	DeepSeek V3.2	Kimi K2-Instruct	MiniMax-M2	GLM-4.6	Hunyuan A13B-Instruct	ERNIE-4.5 300B-A47B
AutoDAN	3.0	80.0	7.5	40.0	44.0	33.0	61.0	53.5	17.5	20.5
GPTFuzzer	9.5	92.0	78.0	97.0	96.5	87.5	19.0	97.0	42.5	98.0
TreeAttack	52.5	47.0	28.5	80.5	80.5	54.5	48.5	58.0	77.5	67.5
SeqAR	92.0	25.5	30.5	96.5	100.0	96.0	1.0	24.5	61.0	99.5
RACE	44.0	81.0	28.0	49.0	65.0	61.5	83.5	69.0	66.0	74.0
AutoDAN-R	96.5	95.5	88.5	100.0	98.0	96.0	89.5	94.0	94.5	96.0
PAIR	50.0	98.5	64.5	82.5	93.0	83.0	90.0	93.5	94.0	89.5
ReNeLLM	1.0	5.0	5.5	68.5	70.5	69.0	7.5	20.5	19.5	42.0
DrAttack	24.5	58.0	66.5	66.5	63.5	83.5	67.5	61.0	56.0	72.5
AutoDAN-Turbo	18.0	4.5	0.0	0.5	14.0	0.0	4.5	11.0	0.0	0.0
CipherChat	9.5	2.5	3.0	97.5	77.5	86.5	75.0	6.5	23.5	59.0
CodeAttack	41.5	92.5	44.5	83.5	83.5	79.0	73.5	86.5	89.5	87.0
Multilingual	3.5	0.5	3.0	62.5	11.5	27.5	0.0	1.0	33.5	7.0
Jailbroken	21.0	58.5	64.5	99.0	95.5	78.0	0.0	20.0	3.5	25.5
ICA	53.5	99.0	97.0	99.0	98.0	83.5	1.0	63.0	1.5	95.5
FlipAttack	90.5	17.5	97.5	99.0	91.5	91.5	31.0	53.5	12.5	97.0
Mousetrap	93.0	96.0	97.5	100.0	97.0	91.5	3.5	98.5	12.5	97.5
Pre-fill	6.0	1.0	0.5	99.5	96.0	50.5	1.5	4.0	3.5	36.0
DeepInception	2.0	29.0	44.0	99.0	99.5	97.0	0.0	22.0	1.5	97.0
Crescendo	12.0	49.0	21.5	56.0	59.0	57.5	50.5	94.5	47.5	46.5
RedQueen	0.5	3.0	1.5	24.0	47.0	36.5	3.0	24.0	2.5	2.0
CoA	10.0	7.0	1.0	9.5	9.0	8.5	53.5	31.0	11.5	37.5
ActorAttack	42.5	35.5	19.5	70.0	76.5	54.0	42.0	76.5	64.5	53.0
Rainbow Teaming	7.0	3.5	16.0	2.0	18.5	25.5	14.5	0.5	96.5	31.0
X-Teaming	94.0	98.5	80.5	94.0	99.0	89.5	93.0	98.5	97.0	95.0
EvoSynth	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs