Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
The paper introduces DialTree, a tree-based dialogue reinforcement learning framework that autonomously discovers diverse and effective multi-turn attack strategies against large language models, significantly outperforming existing single-turn or template-based red-teaming methods.