1

Feb 12, 2026

GREAT: Generalizable Backdoor Attacks in RLHF via Emotion-Aware Trigger Synthesis

we develop a novel framework for crafting generalizable backdoors in RLHF through emotion-aware trigger synthesis

Subrat Kishore Dutta, Yuelin Xu, Piyush Pant, Xiao Zhang

GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs

We introduce Generative Adversarial Suffix Prompter (GASP), a novel framework that combines human-readable prompt generation with Latent Bayesian Optimization (LBO) to improve adversarial suffix creation in a fully black-box setting.

Advik Raj Basani, Xiao Zhang

GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs

Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs

We propose a multi-agent, multi-turn jailbreak strategy that systematically bypasses LLM safety mechanisms by decomposing harmful queries into seemingly benign sub-tasks.

Devansh Srivastav, Xiao Zhang

IAP: Invisible Adversarial Patch Attack through Perceptibility-Aware Localization and Perturbation Optimization

We develop a novel attack framework that generates highly invisible adversarial patches based on perceptibility-aware localization and perturbation schemes.

Subrat Kishore Dutta, Xiao Zhang

Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing

We study how to certify and train for cost-sensitive robustness using randomized smoothing.

Yuan Xin, Dingfan Chen, Michael Backes, Xiao Zhang

DiffPAD: Denoising Diffusion-based Adversarial Patch Decontamination

We propose DiffPAD, a novel framework that harnesses the power of diffusion models for adversarial patch decontamination.

Jia Fu, Xiao Zhang, Sepideh Pashami, Fatemeh Rahimian, Anders Holst

DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy

We highlight the importance of using dynamic FR strategies to evaluate AFR methods, and propose DivTrackee as a promising countermeasure.

Wenshu Fan, Minxing Zhang, Hongwei Li, Wenbo Jiang, Hanxiao Chen, Xiangyu Yue, Michael Backes, Xiao Zhang

DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy

What Distributions are Robust to Indiscriminate Poisoning Attacks for Linear Learners?

Understand the inherent vulnerabilities to indiscriminate data poisoning attacks for linear learners by studying the optimal poisoning strategy from the perspective of data distribution.

Fnu Suya, Xiao Zhang, Yuan Tian, David Evans

What Distributions are Robust to Indiscriminate Poisoning Attacks for Linear Learners?

Understanding Intrinsic Robustness using Label Uncertainty

Built upon a novel definition of label uncertainty, we develop an empirical method to estimate a more realistic intrinsic robustness limit for image classification tasks.

Xiao Zhang, David Evans

Understanding Intrinsic Robustness using Label Uncertainty

Improved Estimation of Concentration Under Lp-Norm Distance Metric Using Half Spaces

We show that concentration of measure does not prohibit the existence of adversarially robust classifiers using a novel method of empirical concentration estimation.

Jack Prescott, Xiao Zhang, David Evans

Improved Estimation of Concentration Under Lp-Norm Distance Metric Using Half Spaces