OpenRT icon
Versatile  ·  Modular  ·  Scalable

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Xin Wang*, Yunhao Chen*, Juncheng Li*, Yixu Wang, Yang Yao, Jie Li, Yan Teng, Yingchun Wang, Xia Hu
Shanghai Artificial Intelligence Laboratory

✨ Visit GitHub Read Paper

OpenRT offers a modular parallel runtime that decouples components and supports diverse attack strategies to systematically evaluate MLLM security.

Abstract

The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benchmarks are often fragmented, limited to single-turn text interactions, and lack the scalability required for systematic evaluation. To address this, we introduce OpenRT, a unified, modular, and high-throughput red-teaming framework designed for comprehensive MLLM safety evaluation. At its core, OpenRT unifies isolated attack scripts into a modular orchestration system. By standardizing attack interfaces, it decouples adversarial logic from a high-throughput asynchronous runtime, enabling systematic scaling across diverse models. This transforms experimental jailbreak techniques into a reproducible, production-ready safety validation pipeline. Our framework integrates 37 diverse attack methodologies, spanning white-box gradients, multi-modal perturbations, and sophisticated multi-agent evolutionary strategies. Through an extensive empirical study on 20 advanced models (including GPT-5.2, Claude 4.5, and Gemini 3 Pro), we expose critical safety gaps: even frontier models fail to generalize across attack paradigms, with leading models exhibiting average Attack Success Rates as high as 49.14%. Notably, our findings reveal that reasoning models do not inherently possess superior robustness against complex, multi-turn jailbreaks. By open-sourcing OpenRT, we provide a sustainable, extensible, and continuously maintained infrastructure that accelerates the development and standardization of AI safety.

🧩 Framework Innovation

Framework Innovation: Formalizing Redteaming as General State-Space Search. OpenRT rearchitects red teaming from a collection of scripts into a General Solver. By abstracting heterogeneous strategies into a unified state-space search problem, it utilizes a standardized Propose-Evaluate-Update loop to natively support MCTS probabilistic sampling, elitism, and multi-stage pruning. This design decouples complex exploration-exploitation logic into infrastructure, enabling developers to reuse underlying search capabilities without reinventing the wheel.

🚀 Extreme Efficiency

30x Acceleration with One-Line Command. Designed for large-scale evaluation, OpenRT leverages a high-concurrency architecture based on AsyncIO and ThreadPool. By achieving dual parallelism in inference and scheduling, it boosts throughput by 30x compared to serial baselines. With just one line of command, users can execute a high-throughput scan covering the entire pipeline—from attack generation and auto-judgment to safety reporting.

🛡️ Most complete and Up-to-date

37+ SOTA Attack Paradigms (Continuously Updated). OpenRT features the industry's most extensive algorithm library, integrating 37+ state-of-the-art attack methods and continuously tracking the latest research advancements. Our arsenal covers the full threat spectrum, including multimodal attacks, multi-agent coordination, logic obfuscation, and iterative optimization. More than just a toolbox, OpenRT serves as a standardized infrastructure for the safety acceptance of next-generation frontier models.

Attack Methods

Method Year Multi-Modal Multi-Turn Multi-Agent Strategy Paradigm
White-Box
GCG 2023 Text Single No Gradient Optimization
Visual Jailbreak 2023 Image Single No Gradient Optimization
Black-Box: Optimization & Fuzzing
AutoDAN 2023 Text Single No Genetic Algorithm
GPTFuzzer 2023 Text Single No Fuzzing / Mutation
TreeAttack 2023 Text Single No Tree-Search Optimization
SeqAR 2024 Text Single No Genetic Algorithm
RACE 2025 Text Single No Gradient/Genetic Optimization
AutoDAN-R 2025 Text Single No Test-Time Scaling
Black-Box: LLM-driven Refinement
PAIR 2023 Text Single No Iterative LLM Optimization
ReNeLLM 2023 Text Single No Rewrite & Nesting
DrAttack 2024 Text Single No Prompt Decomposition
AutoDAN-Turbo 2024 Text Single No Genetic + Gradient Guide
Black-Box: Linguistic & Encoding
CipherChat 2023 Text Single No Cipher/Encryption
CodeAttack 2022 Text Single No Code Encapsulation
Multilingual 2023 Text Single No Low-Resource Language
Jailbroken 2023 Text Single No Template Combination
ICA 2023 Text Single No In-Context Demonstration
FlipAttack 2024 Text Single No Token Flipping / Masking
Mousetrap 2025 Text Single No Logic Nesting / Obfuscation
Prefill 2025 Text Single No Prefix Injection
Black-Box: Contextual Deception
DeepInception 2023 Text Single No Hypnosis or Nested Scene
Crescendo 2024 Text Multi No Multi-turn Steering
RedQueen 2024 Text Multi No Concealed Knowledge
CoA 2024 Text Multi No Chain of Attack
Black-Box: Multimodal Specific
FigStep 2023 Image Single No Typography / OCR
QueryRelevant 2024 Image Single No Visual Prompt Injection
IDEATOR 2024 Image Single No Visual Semantics
MML 2024 Image Single No Cross‑Modal Encryption
HADES 2024 Image Single No Visual Vulnerability Amplification
HIMRD 2024 Image Single No Multi-Modal Risk Distribution
JOOD 2025 Image Single No OOD Transformation
SI 2025 Image Single No Shuffle Inconsistency Optimization
CS-DJ 2025 Image Single No Multi‑Level Visual Distraction
Black-Box: Multi-Agent & Cooperative
ActorAttack 2024 Text Multi Yes Actor-Based Steering
Rainbow Teaming 2024 Text Multi Yes Diversity-Driven Search
X-Teaming 2025 Text Multi Yes Cooperative Exploration
EvoSynth 2025 Text Multi Yes Code-Level Evolutionary Synthesis

Demo

Demo Video (fast playback)

python eval.py \
    --attacker-model deepseek-v3.2 \
    --judge-model gpt-4o-mini \
    --target-models gpt-5.2 \
    --attacks all \
    --dataset harmbench \
    --max-workers 50 \
    --results-dir results/demo

Experiments

Main experiment figure

Attack Performance across Different MLLMs on HarmfulBench

Attack GPT-5.2 GPT-5.1 Claude Haiku 4.5 Gemini 3 Pro Preview Gemini 2.5 Flash Mistral Large 3 Llama-4 Maverick Llama-4 Scout Grok 4.1 Fast Doubao Seed-1.6
AutoDAN 2.0 8.0 1.5 22.5 37.5 28.5 23.5 64.5 38.5 13.0
GPTFuzzer 11.0 1.5 0.0 51.0 93.0 97.5 64.0 97.5 31.0 57.0
TreeAttack 11.0 23.5 8.0 49.5 79.0 74.5 69.5 80.5 81.0 68.0
SeqAR 25.0 29.5 0.0 8.5 97.5 99.0 73.0 88.0 55.5 64.0
RACE 24.5 38.0 24.5 47.0 47.5 53.0 30.5 59.5 49.5 48.0
AutoDAN-R 70.5 69.0 28.5 83.0 96.5 97.0 96.5 80.0 90.0 86.5
PAIR 38.5 72.5 13.0 74.5 84.5 78.0 66.0 89.5 80.0 75.5
ReNeLLM 8.0 33.5 0.5 13.5 51.5 22.0 39.0 57.0 42.5 43.0
DrAttack 32.0 54.0 5.5 56.0 56.0 89.5 60.5 83.0 31.5 68.0
AutoDAN-Turbo 21.5 15.5 1.0 0.0 0.5 83.5 0.5 0.0 3.0 1.0
CipherChat 14.5 64.0 32.5 0.0 89.5 64.0 21.0 68.0 26.0 38.5
CodeAttack 22.0 20.5 29.5 10.5 51.0 8.5 71.0 86.5 22.0 89.0
Multilingual 16.5 25.0 0.0 2.0 34.0 55.5 14.0 0.0 1.5 6.5
Jailbroken 7.0 29.5 0.0 11.0 92.5 98.5 39.5 33.5 31.5 28.0
ICA 14.0 33.5 0.0 9.0 98.5 99.0 8.0 37.0 41.0 65.5
FlipAttack 13.5 68.5 0.0 19.5 95.5 95.5 65.5 54.5 23.0 87.0
Mousetrap 97.5 71.0 0.0 49.0 95.5 100.0 95.5 87.5 100.0 100.0
Prefill 1.0 14.0 0.0 3.5 97.5 97.0 34.5 43.5 25.5 30.5
DeepInception 15.5 19.0 0.0 3.5 84.0 100.0 82.5 94.5 37.5 82.0
Crescendo 32.5 51.0 9.0 47.0 48.0 61.0 17.0 30.5 41.0 58.0
RedQueen 0.0 1.0 0.0 2.5 3.0 4.5 3.0 5.5 1.5 21.5
CoA 15.5 0.0 0.5 2.0 4.5 16.5 3.0 19.0 7.0 4.5
FigStep 2.0 1.5 1.5 7.5 12.0 18.5 42.5 25.5 5.5 13.5
QueryRelevant 1.5 4.0 2.0 5.0 16.0 24.0 26.0 16.0 10.0 8.5
IDEATOR 31.5 73.0 17.0 80.0 95.0 94.5 90.0 94.0 94.5 96.0
MML 4.5 68.0 75.0 40.5 98.0 98.0 90.5 90.5 58.0 97.5
HADES 0.0 1.0 2.0 7.0 29.5 33.0 25.0 29.0 22.5 17.5
HIMRD 11.5 35.0 0.0 9.0 70.0 61.5 3.5 29.5 1.5 49.5
JOOD 65.0 62.5 38.0 56.0 61.5 63.0 38.5 39.5 69.5 72.0
SI 3.0 45.0 14.0 37.0 82.5 47.5 81.0 71.5 27.0 44.0
CS-DJ 15.0 21.5 23.5 35.0 39.5 38.0 35.0 39.5 28.5 51.0
ActorAttack 0.5 31.0 10.0 65.0 76.0 0.5 65.5 79.0 50.0 56.0
Rainbow Teaming 0.5 3.5 12.0 73.5 61.0 5.5 3.5 35.0 13.5 67.0
X-Teaming 75.5 95.5 47.5 86.5 89.0 91.0 86.0 98.0 90.5 87.0
EvoSynth 99.0 100.0 74.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

Attack Performance across Different LLMs for HarmfulBench Evaluation

Attack Qwen3-Max Qwen3-235B A22B Qwen3-Next 80B-A3B DeepSeek R1 DeepSeek V3.2 Kimi K2-Instruct MiniMax-M2 GLM-4.6 Hunyuan A13B-Instruct ERNIE-4.5 300B-A47B
AutoDAN 3.0 80.0 7.5 40.0 44.0 33.0 61.0 53.5 17.5 20.5
GPTFuzzer 9.5 92.0 78.0 97.0 96.5 87.5 19.0 97.0 42.5 98.0
TreeAttack 52.5 47.0 28.5 80.5 80.5 54.5 48.5 58.0 77.5 67.5
SeqAR 92.0 25.5 30.5 96.5 100.0 96.0 1.0 24.5 61.0 99.5
RACE 44.0 81.0 28.0 49.0 65.0 61.5 83.5 69.0 66.0 74.0
AutoDAN-R 96.5 95.5 88.5 100.0 98.0 96.0 89.5 94.0 94.5 96.0
PAIR 50.0 98.5 64.5 82.5 93.0 83.0 90.0 93.5 94.0 89.5
ReNeLLM 1.0 5.0 5.5 68.5 70.5 69.0 7.5 20.5 19.5 42.0
DrAttack 24.5 58.0 66.5 66.5 63.5 83.5 67.5 61.0 56.0 72.5
AutoDAN-Turbo 18.0 4.5 0.0 0.5 14.0 0.0 4.5 11.0 0.0 0.0
CipherChat 9.5 2.5 3.0 97.5 77.5 86.5 75.0 6.5 23.5 59.0
CodeAttack 41.5 92.5 44.5 83.5 83.5 79.0 73.5 86.5 89.5 87.0
Multilingual 3.5 0.5 3.0 62.5 11.5 27.5 0.0 1.0 33.5 7.0
Jailbroken 21.0 58.5 64.5 99.0 95.5 78.0 0.0 20.0 3.5 25.5
ICA 53.5 99.0 97.0 99.0 98.0 83.5 1.0 63.0 1.5 95.5
FlipAttack 90.5 17.5 97.5 99.0 91.5 91.5 31.0 53.5 12.5 97.0
Mousetrap 93.0 96.0 97.5 100.0 97.0 91.5 3.5 98.5 12.5 97.5
Pre-fill 6.0 1.0 0.5 99.5 96.0 50.5 1.5 4.0 3.5 36.0
DeepInception 2.0 29.0 44.0 99.0 99.5 97.0 0.0 22.0 1.5 97.0
Crescendo 12.0 49.0 21.5 56.0 59.0 57.5 50.5 94.5 47.5 46.5
RedQueen 0.5 3.0 1.5 24.0 47.0 36.5 3.0 24.0 2.5 2.0
CoA 10.0 7.0 1.0 9.5 9.0 8.5 53.5 31.0 11.5 37.5
ActorAttack 42.5 35.5 19.5 70.0 76.5 54.0 42.0 76.5 64.5 53.0
Rainbow Teaming 7.0 3.5 16.0 2.0 18.5 25.5 14.5 0.5 96.5 31.0
X-Teaming 94.0 98.5 80.5 94.0 99.0 89.5 93.0 98.5 97.0 95.0
EvoSynth 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

BibTeX

@article{openrt2025,
  title   = {OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs},
  author  = {Xin Wang and Yunhao Chen and Juncheng Li and Yixu Wang and Yang Yao and Jie Li and Yan Teng and Yingchun Wang and Xia Hu},
  journal = {Shanghai Artificial Intelligence Laboratory},
  year    = {2025},
  url     = {https://github.com/AI45Lab/OpenRT}
}

Contact

Authors

Xin Wang* , Yunhao Chen*, Juncheng Li*, Yixu Wang, Yang Yao , Jie Li, Yan Teng†, Yingchun Wang, Xia Hu

Affiliation: Shanghai Artificial Intelligence Laboratory

Contact

Corresponding email: tengyan@pjlab.org.cn

Project homepage: https://github.com/AI45Lab/OpenRT