About Me
I am a final-year Ph.D. student at CISPA Helmholtz Center for Information Security, supervised by Michael Backes. Prior to that, I obtained my bachelor’s (2018) and master’s (2021) degrees from the University of Science and Technology of China (USTC).
Research Interests
- Agentic RL
- LLM Post-Training & Alignment
- Certifiable Robustness Methods
Publications
Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?
Yuan Xin, Dingfan Chen, Linyi Yang, Michael Backes, Xiao Zhang. ACL 2026 Findings.Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing
Yuan Xin, Dingfan Chen, Michael Backes, Xiao Zhang. ICML 2025.Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders
Yuan Xin, Zheng Li, Ning Yu, Dingfan Chen, Mario Fritz, Michael Backes, Yang Zhang. ECAI 2024.Label Incorporated Graph Neural Networks for Text Classification
Yuan Xin, Linli Xu, Junliang Guo, Jiquan Li, Xin Sheng, Yuanyuan Zhou. ICPR 2020.
Research Experience
- Summer 2021: NLP Research Intern, Alibaba DAMO
- Use self-training methods to optimize the machine translation performance
- Optimize the evaluation metrics for machine translation, let the model evaluate its own performance without external reference.
- Summer 2020: NLP Research Intern, Baidu Talent Intelligence Center
- Extract hierarchical relations between skills in the JD dataset.
- Skills representation learning
