SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs

Published in ResponsibleFM, NeurIPS2025, 2025

Recommended citation: @misc{siu2025steeringsafetysystematicsafetyevaluation, title={SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs}, author={Vincent Siu and Nicholas Crispino and David Park and Nathan W. Henry and Zhun Wang and Yang Liu and Dawn Song and Chenguang Wang}, year={2025}, eprint={2509.13450}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2509.13450}, }
Download Paper