OpenRT offers a modular parallel runtime that decouples components and supports diverse attack strategies to systematically evaluate MLLM security.
The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benchmarks are often fragmented, limited to single-turn text interactions, and lack the scalability required for systematic evaluation. To address this, we introduce OpenRT, a unified, modular, and high-throughput red-teaming framework designed for comprehensive MLLM safety evaluation. At its core, OpenRT unifies isolated attack scripts into a modular orchestration system. By standardizing attack interfaces, it decouples adversarial logic from a high-throughput asynchronous runtime, enabling systematic scaling across diverse models. This transforms experimental jailbreak techniques into a reproducible, production-ready safety validation pipeline. Our framework integrates 37 diverse attack methodologies, spanning white-box gradients, multi-modal perturbations, and sophisticated multi-agent evolutionary strategies. Through an extensive empirical study on 20 advanced models (including GPT-5.2, Claude 4.5, and Gemini 3 Pro), we expose critical safety gaps: even frontier models fail to generalize across attack paradigms, with leading models exhibiting average Attack Success Rates as high as 49.14%. Notably, our findings reveal that reasoning models do not inherently possess superior robustness against complex, multi-turn jailbreaks. By open-sourcing OpenRT, we provide a sustainable, extensible, and continuously maintained infrastructure that accelerates the development and standardization of AI safety.
Framework Innovation: Formalizing Redteaming as General State-Space Search. OpenRT rearchitects red teaming from a collection of scripts into a General Solver. By abstracting heterogeneous strategies into a unified state-space search problem, it utilizes a standardized Propose-Evaluate-Update loop to natively support MCTS probabilistic sampling, elitism, and multi-stage pruning. This design decouples complex exploration-exploitation logic into infrastructure, enabling developers to reuse underlying search capabilities without reinventing the wheel.
30x Acceleration with One-Line Command. Designed for large-scale evaluation, OpenRT leverages a high-concurrency architecture based on AsyncIO and ThreadPool. By achieving dual parallelism in inference and scheduling, it boosts throughput by 30x compared to serial baselines. With just one line of command, users can execute a high-throughput scan covering the entire pipeline—from attack generation and auto-judgment to safety reporting.
37+ SOTA Attack Paradigms (Continuously Updated). OpenRT features the industry's most extensive algorithm library, integrating 37+ state-of-the-art attack methods and continuously tracking the latest research advancements. Our arsenal covers the full threat spectrum, including multimodal attacks, multi-agent coordination, logic obfuscation, and iterative optimization. More than just a toolbox, OpenRT serves as a standardized infrastructure for the safety acceptance of next-generation frontier models.
| Method | Year | Multi-Modal | Multi-Turn | Multi-Agent | Strategy Paradigm |
|---|---|---|---|---|---|
| White-Box | |||||
| GCG | 2023 | Text | Single | No | Gradient Optimization |
| Visual Jailbreak | 2023 | Image | Single | No | Gradient Optimization |
| Black-Box: Optimization & Fuzzing | |||||
| AutoDAN | 2023 | Text | Single | No | Genetic Algorithm |
| GPTFuzzer | 2023 | Text | Single | No | Fuzzing / Mutation |
| TreeAttack | 2023 | Text | Single | No | Tree-Search Optimization |
| SeqAR | 2024 | Text | Single | No | Genetic Algorithm |
| RACE | 2025 | Text | Single | No | Gradient/Genetic Optimization |
| AutoDAN-R | 2025 | Text | Single | No | Test-Time Scaling |
| Black-Box: LLM-driven Refinement | |||||
| PAIR | 2023 | Text | Single | No | Iterative LLM Optimization |
| ReNeLLM | 2023 | Text | Single | No | Rewrite & Nesting |
| DrAttack | 2024 | Text | Single | No | Prompt Decomposition |
| AutoDAN-Turbo | 2024 | Text | Single | No | Genetic + Gradient Guide |
| Black-Box: Linguistic & Encoding | |||||
| CipherChat | 2023 | Text | Single | No | Cipher/Encryption |
| CodeAttack | 2022 | Text | Single | No | Code Encapsulation |
| Multilingual | 2023 | Text | Single | No | Low-Resource Language |
| Jailbroken | 2023 | Text | Single | No | Template Combination |
| ICA | 2023 | Text | Single | No | In-Context Demonstration |
| FlipAttack | 2024 | Text | Single | No | Token Flipping / Masking |
| Mousetrap | 2025 | Text | Single | No | Logic Nesting / Obfuscation |
| Prefill | 2025 | Text | Single | No | Prefix Injection |
| Black-Box: Contextual Deception | |||||
| DeepInception | 2023 | Text | Single | No | Hypnosis or Nested Scene |
| Crescendo | 2024 | Text | Multi | No | Multi-turn Steering |
| RedQueen | 2024 | Text | Multi | No | Concealed Knowledge |
| CoA | 2024 | Text | Multi | No | Chain of Attack |
| Black-Box: Multimodal Specific | |||||
| FigStep | 2023 | Image | Single | No | Typography / OCR |
| QueryRelevant | 2024 | Image | Single | No | Visual Prompt Injection |
| IDEATOR | 2024 | Image | Single | No | Visual Semantics |
| MML | 2024 | Image | Single | No | Cross‑Modal Encryption |
| HADES | 2024 | Image | Single | No | Visual Vulnerability Amplification |
| HIMRD | 2024 | Image | Single | No | Multi-Modal Risk Distribution |
| JOOD | 2025 | Image | Single | No | OOD Transformation |
| SI | 2025 | Image | Single | No | Shuffle Inconsistency Optimization |
| CS-DJ | 2025 | Image | Single | No | Multi‑Level Visual Distraction |
| Black-Box: Multi-Agent & Cooperative | |||||
| ActorAttack | 2024 | Text | Multi | Yes | Actor-Based Steering |
| Rainbow Teaming | 2024 | Text | Multi | Yes | Diversity-Driven Search |
| X-Teaming | 2025 | Text | Multi | Yes | Cooperative Exploration |
| EvoSynth | 2025 | Text | Multi | Yes | Code-Level Evolutionary Synthesis |
Demo Video (fast playback)
python eval.py \
--attacker-model deepseek-v3.2 \
--judge-model gpt-4o-mini \
--target-models gpt-5.2 \
--attacks all \
--dataset harmbench \
--max-workers 50 \
--results-dir results/demo
| Attack | GPT-5.2 | GPT-5.1 | Claude Haiku 4.5 | Gemini 3 Pro Preview | Gemini 2.5 Flash | Mistral Large 3 | Llama-4 Maverick | Llama-4 Scout | Grok 4.1 Fast | Doubao Seed-1.6 |
|---|---|---|---|---|---|---|---|---|---|---|
| AutoDAN | 2.0 | 8.0 | 1.5 | 22.5 | 37.5 | 28.5 | 23.5 | 64.5 | 38.5 | 13.0 |
| GPTFuzzer | 11.0 | 1.5 | 0.0 | 51.0 | 93.0 | 97.5 | 64.0 | 97.5 | 31.0 | 57.0 |
| TreeAttack | 11.0 | 23.5 | 8.0 | 49.5 | 79.0 | 74.5 | 69.5 | 80.5 | 81.0 | 68.0 |
| SeqAR | 25.0 | 29.5 | 0.0 | 8.5 | 97.5 | 99.0 | 73.0 | 88.0 | 55.5 | 64.0 |
| RACE | 24.5 | 38.0 | 24.5 | 47.0 | 47.5 | 53.0 | 30.5 | 59.5 | 49.5 | 48.0 |
| AutoDAN-R | 70.5 | 69.0 | 28.5 | 83.0 | 96.5 | 97.0 | 96.5 | 80.0 | 90.0 | 86.5 |
| PAIR | 38.5 | 72.5 | 13.0 | 74.5 | 84.5 | 78.0 | 66.0 | 89.5 | 80.0 | 75.5 |
| ReNeLLM | 8.0 | 33.5 | 0.5 | 13.5 | 51.5 | 22.0 | 39.0 | 57.0 | 42.5 | 43.0 |
| DrAttack | 32.0 | 54.0 | 5.5 | 56.0 | 56.0 | 89.5 | 60.5 | 83.0 | 31.5 | 68.0 |
| AutoDAN-Turbo | 21.5 | 15.5 | 1.0 | 0.0 | 0.5 | 83.5 | 0.5 | 0.0 | 3.0 | 1.0 |
| CipherChat | 14.5 | 64.0 | 32.5 | 0.0 | 89.5 | 64.0 | 21.0 | 68.0 | 26.0 | 38.5 |
| CodeAttack | 22.0 | 20.5 | 29.5 | 10.5 | 51.0 | 8.5 | 71.0 | 86.5 | 22.0 | 89.0 |
| Multilingual | 16.5 | 25.0 | 0.0 | 2.0 | 34.0 | 55.5 | 14.0 | 0.0 | 1.5 | 6.5 |
| Jailbroken | 7.0 | 29.5 | 0.0 | 11.0 | 92.5 | 98.5 | 39.5 | 33.5 | 31.5 | 28.0 |
| ICA | 14.0 | 33.5 | 0.0 | 9.0 | 98.5 | 99.0 | 8.0 | 37.0 | 41.0 | 65.5 |
| FlipAttack | 13.5 | 68.5 | 0.0 | 19.5 | 95.5 | 95.5 | 65.5 | 54.5 | 23.0 | 87.0 |
| Mousetrap | 97.5 | 71.0 | 0.0 | 49.0 | 95.5 | 100.0 | 95.5 | 87.5 | 100.0 | 100.0 |
| Prefill | 1.0 | 14.0 | 0.0 | 3.5 | 97.5 | 97.0 | 34.5 | 43.5 | 25.5 | 30.5 |
| DeepInception | 15.5 | 19.0 | 0.0 | 3.5 | 84.0 | 100.0 | 82.5 | 94.5 | 37.5 | 82.0 |
| Crescendo | 32.5 | 51.0 | 9.0 | 47.0 | 48.0 | 61.0 | 17.0 | 30.5 | 41.0 | 58.0 |
| RedQueen | 0.0 | 1.0 | 0.0 | 2.5 | 3.0 | 4.5 | 3.0 | 5.5 | 1.5 | 21.5 |
| CoA | 15.5 | 0.0 | 0.5 | 2.0 | 4.5 | 16.5 | 3.0 | 19.0 | 7.0 | 4.5 |
| FigStep | 2.0 | 1.5 | 1.5 | 7.5 | 12.0 | 18.5 | 42.5 | 25.5 | 5.5 | 13.5 |
| QueryRelevant | 1.5 | 4.0 | 2.0 | 5.0 | 16.0 | 24.0 | 26.0 | 16.0 | 10.0 | 8.5 |
| IDEATOR | 31.5 | 73.0 | 17.0 | 80.0 | 95.0 | 94.5 | 90.0 | 94.0 | 94.5 | 96.0 |
| MML | 4.5 | 68.0 | 75.0 | 40.5 | 98.0 | 98.0 | 90.5 | 90.5 | 58.0 | 97.5 |
| HADES | 0.0 | 1.0 | 2.0 | 7.0 | 29.5 | 33.0 | 25.0 | 29.0 | 22.5 | 17.5 |
| HIMRD | 11.5 | 35.0 | 0.0 | 9.0 | 70.0 | 61.5 | 3.5 | 29.5 | 1.5 | 49.5 |
| JOOD | 65.0 | 62.5 | 38.0 | 56.0 | 61.5 | 63.0 | 38.5 | 39.5 | 69.5 | 72.0 |
| SI | 3.0 | 45.0 | 14.0 | 37.0 | 82.5 | 47.5 | 81.0 | 71.5 | 27.0 | 44.0 |
| CS-DJ | 15.0 | 21.5 | 23.5 | 35.0 | 39.5 | 38.0 | 35.0 | 39.5 | 28.5 | 51.0 |
| ActorAttack | 0.5 | 31.0 | 10.0 | 65.0 | 76.0 | 0.5 | 65.5 | 79.0 | 50.0 | 56.0 |
| Rainbow Teaming | 0.5 | 3.5 | 12.0 | 73.5 | 61.0 | 5.5 | 3.5 | 35.0 | 13.5 | 67.0 |
| X-Teaming | 75.5 | 95.5 | 47.5 | 86.5 | 89.0 | 91.0 | 86.0 | 98.0 | 90.5 | 87.0 |
| EvoSynth | 99.0 | 100.0 | 74.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
| Attack | Qwen3-Max | Qwen3-235B A22B | Qwen3-Next 80B-A3B | DeepSeek R1 | DeepSeek V3.2 | Kimi K2-Instruct | MiniMax-M2 | GLM-4.6 | Hunyuan A13B-Instruct | ERNIE-4.5 300B-A47B |
|---|---|---|---|---|---|---|---|---|---|---|
| AutoDAN | 3.0 | 80.0 | 7.5 | 40.0 | 44.0 | 33.0 | 61.0 | 53.5 | 17.5 | 20.5 |
| GPTFuzzer | 9.5 | 92.0 | 78.0 | 97.0 | 96.5 | 87.5 | 19.0 | 97.0 | 42.5 | 98.0 |
| TreeAttack | 52.5 | 47.0 | 28.5 | 80.5 | 80.5 | 54.5 | 48.5 | 58.0 | 77.5 | 67.5 |
| SeqAR | 92.0 | 25.5 | 30.5 | 96.5 | 100.0 | 96.0 | 1.0 | 24.5 | 61.0 | 99.5 |
| RACE | 44.0 | 81.0 | 28.0 | 49.0 | 65.0 | 61.5 | 83.5 | 69.0 | 66.0 | 74.0 |
| AutoDAN-R | 96.5 | 95.5 | 88.5 | 100.0 | 98.0 | 96.0 | 89.5 | 94.0 | 94.5 | 96.0 |
| PAIR | 50.0 | 98.5 | 64.5 | 82.5 | 93.0 | 83.0 | 90.0 | 93.5 | 94.0 | 89.5 |
| ReNeLLM | 1.0 | 5.0 | 5.5 | 68.5 | 70.5 | 69.0 | 7.5 | 20.5 | 19.5 | 42.0 |
| DrAttack | 24.5 | 58.0 | 66.5 | 66.5 | 63.5 | 83.5 | 67.5 | 61.0 | 56.0 | 72.5 |
| AutoDAN-Turbo | 18.0 | 4.5 | 0.0 | 0.5 | 14.0 | 0.0 | 4.5 | 11.0 | 0.0 | 0.0 |
| CipherChat | 9.5 | 2.5 | 3.0 | 97.5 | 77.5 | 86.5 | 75.0 | 6.5 | 23.5 | 59.0 |
| CodeAttack | 41.5 | 92.5 | 44.5 | 83.5 | 83.5 | 79.0 | 73.5 | 86.5 | 89.5 | 87.0 |
| Multilingual | 3.5 | 0.5 | 3.0 | 62.5 | 11.5 | 27.5 | 0.0 | 1.0 | 33.5 | 7.0 |
| Jailbroken | 21.0 | 58.5 | 64.5 | 99.0 | 95.5 | 78.0 | 0.0 | 20.0 | 3.5 | 25.5 |
| ICA | 53.5 | 99.0 | 97.0 | 99.0 | 98.0 | 83.5 | 1.0 | 63.0 | 1.5 | 95.5 |
| FlipAttack | 90.5 | 17.5 | 97.5 | 99.0 | 91.5 | 91.5 | 31.0 | 53.5 | 12.5 | 97.0 |
| Mousetrap | 93.0 | 96.0 | 97.5 | 100.0 | 97.0 | 91.5 | 3.5 | 98.5 | 12.5 | 97.5 |
| Pre-fill | 6.0 | 1.0 | 0.5 | 99.5 | 96.0 | 50.5 | 1.5 | 4.0 | 3.5 | 36.0 |
| DeepInception | 2.0 | 29.0 | 44.0 | 99.0 | 99.5 | 97.0 | 0.0 | 22.0 | 1.5 | 97.0 |
| Crescendo | 12.0 | 49.0 | 21.5 | 56.0 | 59.0 | 57.5 | 50.5 | 94.5 | 47.5 | 46.5 |
| RedQueen | 0.5 | 3.0 | 1.5 | 24.0 | 47.0 | 36.5 | 3.0 | 24.0 | 2.5 | 2.0 |
| CoA | 10.0 | 7.0 | 1.0 | 9.5 | 9.0 | 8.5 | 53.5 | 31.0 | 11.5 | 37.5 |
| ActorAttack | 42.5 | 35.5 | 19.5 | 70.0 | 76.5 | 54.0 | 42.0 | 76.5 | 64.5 | 53.0 |
| Rainbow Teaming | 7.0 | 3.5 | 16.0 | 2.0 | 18.5 | 25.5 | 14.5 | 0.5 | 96.5 | 31.0 |
| X-Teaming | 94.0 | 98.5 | 80.5 | 94.0 | 99.0 | 89.5 | 93.0 | 98.5 | 97.0 | 95.0 |
| EvoSynth | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
@article{openrt2025,
title = {OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs},
author = {Xin Wang and Yunhao Chen and Juncheng Li and Yixu Wang and Yang Yao and Jie Li and Yan Teng and Yingchun Wang and Xia Hu},
journal = {Shanghai Artificial Intelligence Laboratory},
year = {2025},
url = {https://github.com/AI45Lab/OpenRT}
}
Xin Wang* , Yunhao Chen*, Juncheng Li*, Yixu Wang, Yang Yao , Jie Li, Yan Teng†, Yingchun Wang, Xia Hu
Affiliation: Shanghai Artificial Intelligence Laboratory
Corresponding email: tengyan@pjlab.org.cn
Project homepage: https://github.com/AI45Lab/OpenRT