Xiao Zhang
Xiao Zhang
About
Research
Publication
Teaching
Service
Contact
Light
Dark
Automatic
1
Feb 12, 2026
GREAT: Generalizable Backdoor Attacks in RLHF via Emotion-Aware Trigger Synthesis
we develop a novel framework for crafting generalizable backdoors in RLHF through emotion-aware trigger synthesis
Subrat Kishore Dutta
,
Yuelin Xu
,
Piyush Pant
,
Xiao Zhang
PDF
Cite
ArXiv
GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs
We introduce Generative Adversarial Suffix Prompter (GASP), a novel framework that combines human-readable prompt generation with Latent Bayesian Optimization (LBO) to improve adversarial suffix creation in a fully black-box setting.
Advik Raj Basani
,
Xiao Zhang
PDF
Cite
Code
ArXiv
OpenReview
Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs
We propose a multi-agent, multi-turn jailbreak strategy that systematically bypasses LLM safety mechanisms by decomposing harmful queries into seemingly benign sub-tasks.
Devansh Srivastav
,
Xiao Zhang
PDF
Cite
Code
Source Document
IAP: Invisible Adversarial Patch Attack through Perceptibility-Aware Localization and Perturbation Optimization
We develop a novel attack framework that generates highly invisible adversarial patches based on perceptibility-aware localization and perturbation schemes.
Subrat Kishore Dutta
,
Xiao Zhang
PDF
Cite
Code
ArXiv
Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing
We study how to certify and train for cost-sensitive robustness using randomized smoothing.
Yuan Xin
,
Dingfan Chen
,
Michael Backes
,
Xiao Zhang
PDF
Cite
Code
ArXiv
OpenReview
DiffPAD: Denoising Diffusion-based Adversarial Patch Decontamination
We propose DiffPAD, a novel framework that harnesses the power of diffusion models for adversarial patch decontamination.
Jia Fu
,
Xiao Zhang
,
Sepideh Pashami
,
Fatemeh Rahimian
,
Anders Holst
PDF
Cite
ArXiv
DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy
We highlight the importance of using dynamic FR strategies to evaluate AFR methods, and propose DivTrackee as a promising countermeasure.
Wenshu Fan
,
Minxing Zhang
,
Hongwei Li
,
Wenbo Jiang
,
Hanxiao Chen
,
Xiangyu Yue
,
Michael Backes
,
Xiao Zhang
PDF
Cite
Code
ArXiv
What Distributions are Robust to Indiscriminate Poisoning Attacks for Linear Learners?
Understand the inherent vulnerabilities to indiscriminate data poisoning attacks for linear learners by studying the optimal poisoning strategy from the perspective of data distribution.
Fnu Suya
,
Xiao Zhang
,
Yuan Tian
,
David Evans
PDF
Cite
ArXiv
Understanding Intrinsic Robustness using Label Uncertainty
Built upon a novel definition of label uncertainty, we develop an empirical method to estimate a more realistic intrinsic robustness limit for image classification tasks.
Xiao Zhang
,
David Evans
PDF
Cite
Code
ArXiv
OpenReview
Improved Estimation of Concentration Under Lp-Norm Distance Metric Using Half Spaces
We show that concentration of measure does not prohibit the existence of adversarially robust classifiers using a novel method of empirical concentration estimation.
Jack Prescott
,
Xiao Zhang
,
David Evans
PDF
Cite
Code
ArXiv
OpenReview
Post
»
Cite
×