Zhun Wang, Vincent Siu, Zhe Ye, Tianneng Shi, Yuzhou Nie, Xuandong Zhao, Chenguang Wang, Wenbo Guo, Dawn Song
Accepted to EMNLP 2025 Findings
AgentVigil introduces a comprehensive framework for conducting end-to-end red-teaming evaluations of black-box AI agents. The work introduces a fuzzing methodology using a Monte-Carlo Tree Search to systematically exploit indirect prompt injections in blackbox AI agent systems, providing valuable insights for improving agent robustness and safety in real-world deployments.
Paper Link | BibTeX
@misc{wang2025agentvigilgenericblackboxredteaming, title={AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents}, author={Zhun Wang and Vincent Siu and Zhe Ye and Tianneng Shi and Yuzhou Nie and Xuandong Zhao and Chenguang Wang and Wenbo Guo and Dawn Song}, year={2025}, eprint={2505.05849}, archivePrefix={arXiv}, primaryClass={cs.CR}, url={https://arxiv.org/abs/2505.05849}, }