Adversarial attack


θ is the special model, ε is disturbance, x is input, y is real value, L is loss function.

Attack: to find the minimization ε as far as possible.


Normally, we have to minimize ρ and get maximization of loss function.



Intriguing properties of neural networks

Intriguing properties of neural networks 2014

  1. Individual units can't express semantic information (feature).


Explaining and Harnessing Adversarial Examples (FGSM) 2015


Adversarial Training Methods for Semi-Supervised Text Classification (FGM)


Towards Deep Learning Models Resistant to Adversarial attacks (PGD) 2017


Adversarial Training for Free! (FreeAT) 2019


You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle (YOPO) 2019


Enhanced Adversarial Training for Language Understanding (FreeLB) 2019


Robust and Efficient Fine-Tuning for Pre-trained Natural (SMART) 2019


Boosting Adversarial Attacks with Momentum (MIM) 2017


Towards Evaluating the Robustness of Neural Networks (CW) 2016

Deeply Supervised Discriminative Learning for Adversarial Defense 2020, 6

Previous defense method:

modify the inputs during testing time:

  1. JPEG compression operation is equivalent to selective blurring of the image, helping remove additive perturbations. Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression 2017
  2. Random resizing, which resizes the input images to a random size Mitigating Adversarial Effects Through Randomization 2017, 570
  3. deep image restoration networks learn mapping functions that can bring off-the-manifold adversarial samples onto the natural image manifold Image Super-Resolution as a Defense Against Adversarial Attacks 2019
  4. When the neural responses are linear, applying the foveation mechanism to the adversarial example tends to significantly reduce the effect of the perturbation Foveation-based Mechanisms Alleviate Adversarial Examples 2015

proactive defense, which alters the underlying model’s architecture or learning procedure.

  1. On ImageNet, Ensemble adversarial Training yields models with strong robustness to black-box attacks. Ensemble Adversarial Training: Attacks and Defenses 2017,1531

  2. It finds that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera. Adversarial examples in the physical world 2016, 3092

  3. Papernot used distillation to improve the model’s robustness by retraining it with soft labels. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks 2015, 2081

  4. Parsevel Networks restrict the Lipschitz constant of each layer of the model. Parseval networks: Improving robustness to adversarial examples 2017, 513

  5. With HGD as a defense, the target model is more robust to either white-box or black-box adversarial attacks. Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser 2017, 418

  6. This paper presents a new method that explores the interaction among individual networks to improve robustness for ensemble models. Improving Adversarial Robustness via Promoting Ensemble Diversity 2019, 152

  7. Min-Max Optimization is one of the strongest defense methods, which augments the training data with first order attacked samples. Towards deep learning models resistant to adversarial attacks 2017, 4165

  8. It introduces enhanced defenses using a technique we call logit pairing, a method that encourages logits for pairs of examples to be similar. Adversarial Logit Pairing 2018, 370

  9. The current defenses are successfully circumvented under white-box settings. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples 2018,1751

  10. Specifically, we study applying image transformations such as bit-depth reduction, JPEG compression, total varepsiloniance minimization, and image quilting before feeding the image to a convolutional network classifier. Countering Adversarial Images using Input Transformations 2017, 733

Preliminary work: Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks 2019, 57

Efficiently compute perturbations that fool deep networks DeepFool: a simple and accurate method to fool deep neural networks 2015, 2870

Training objective is inspired from center loss, which clusters penultimate layer features. A Discriminative Feature Learning Approach for Deep Face Recognition 2016, 2542


Limit affine

Firstly, considering a affine function. When the matrix A exists a very large value, I assume it's aij, so x has a little perturbation that y is going to have a big change.


If we limit the maximum of A, whether we can defense adversarial attack?


Random disturbance - disrupting disturbance distribution


According to some studies, it shows that it doesn't work well even if the epsilon is very large. so I think it may be related to disturbance distribution, the experiment has verified it, which gets no bad result by disrupting disturbance distribution.

But the method is too weak to be cracked as it is reversible, so whether or not there is a method that can disrupt disturbance distribution while it is irreversible.

New idea:

We can make three verification, we use the primal image, the left shifted image and the right shifted image. Then we get three answers.

There are three scenarios:

  1. Three answers are the same, we think is a normal image.
  2. Two answers are the same value a2, but the last one are not, we think is a adversarial image and the right class is a2.
  3. Two answers are the same value a2, but the last one are not, we think is a adversarial image and the right class isn't a2.
  4. There is no the same answer, we think it is a image without meaning.


  1. detecting adversarial example
  2. It can get right class even if there is adversarial disturbance
  3. It's irreversible and simple.

We need transformation T:

T1T(a+b)=T(a)+T(b) ε0ε=argmaxεSL(θ,x+ε,y)Random:εr12(L(θ,x,y)L(θ,T(x),y))2ε 12(L(θ,x,y)L(θ,x+εr,y))2ε 12(L(θ,x+εr,y)L(θ,T(x)+T(ε),y))ε

Perturbation proof

ε=argmaxεSL(θ,x+ε,y) L(θ,x+ε,y) ε=0

Expand L

y=f(x)L(θ,x(i)+ε(i),y)=12[i=1m(h(x(i)+ε(i),θ)f(x(i))] i=1m(h(x(i)+ε(i),θ)f(x(i)))2 ε=0

For convenience, we set m=1

m=1(h(x+ε,θ)f(x)) h(x+ε,θ) ε=0 h(x+ε,θ) ε=0

Conclusion: It shows that there is nothing related to y, and why?

i=1m (h(x(i)+ε(i),θ)f(x(i)))2 ε=0i=1m(h(x(i)+ε(i),θ)f(x(i))) h(x(i)+ε(i),θ) ε=0

Consider a simple situation.

 h(x(i)+ε(i),θ) ε=0,i=1,,m


Models with adversarial attacks defense improve robustness.

Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset. [Explaining and Harnessing Adversarial Examples, Goodfellow, 2015]