Papers
is the special model, is disturbance, is input, is real value, is loss function.
Attack: to find the minimization as far as possible.
Normally, we have to minimize and get maximization of loss function.
Defense:
Intriguing properties of neural networks
Intriguing properties of neural networks 2014
- Individual units can't express semantic information (feature).
FGSM
Explaining and Harnessing Adversarial Examples (FGSM) 2015
FGM
Adversarial Training Methods for Semi-Supervised Text Classification (FGM)
PGD
Towards Deep Learning Models Resistant to Adversarial attacks (PGD) 2017
FreeAT
Adversarial Training for Free! (FreeAT) 2019
YOPO
You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle (YOPO) 2019
FreeLB
Enhanced Adversarial Training for Language Understanding (FreeLB) 2019
SMART
Robust and Efficient Fine-Tuning for Pre-trained Natural (SMART) 2019
MIM
Boosting Adversarial Attacks with Momentum (MIM) 2017
CW
Towards Evaluating the Robustness of Neural Networks (CW) 2016
Deeply Supervised Discriminative Learning for Adversarial Defense 2020, 6
Previous defense method:
modify the inputs during testing time:
- JPEG compression operation is equivalent to selective blurring of the image, helping remove additive perturbations. Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression 2017
- Random resizing, which resizes the input images to a random size Mitigating Adversarial Effects Through Randomization 2017, 570
- deep image restoration networks learn mapping functions that can bring off-the-manifold adversarial samples onto the natural image manifold Image Super-Resolution as a Defense Against Adversarial Attacks 2019
- When the neural responses are linear, applying the foveation mechanism to the adversarial example tends to significantly reduce the effect of the perturbation Foveation-based Mechanisms Alleviate Adversarial Examples 2015
proactive defense, which alters the underlying model’s architecture or learning procedure.
-
On ImageNet, Ensemble adversarial Training yields models with strong robustness to black-box attacks. Ensemble Adversarial Training: Attacks and Defenses 2017,1531
-
It finds that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera. Adversarial examples in the physical world 2016, 3092
-
Papernot used distillation to improve the model’s robustness by retraining it with soft labels. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks 2015, 2081
-
Parsevel Networks restrict the Lipschitz constant of each layer of the model. Parseval networks: Improving robustness to adversarial examples 2017, 513
-
With HGD as a defense, the target model is more robust to either white-box or black-box adversarial attacks. Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser 2017, 418
-
This paper presents a new method that explores the interaction among individual networks to improve robustness for ensemble models. Improving Adversarial Robustness via Promoting Ensemble Diversity 2019, 152
-
Min-Max Optimization is one of the strongest defense methods, which augments the training data with first order attacked samples. Towards deep learning models resistant to adversarial attacks 2017, 4165
-
It introduces enhanced defenses using a technique we call logit pairing, a method that encourages logits for pairs of examples to be similar. Adversarial Logit Pairing 2018, 370
-
The current defenses are successfully circumvented under white-box settings. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples 2018,1751
-
Specifically, we study applying image transformations such as bit-depth reduction, JPEG compression, total varepsiloniance minimization, and image quilting before feeding the image to a convolutional network classifier. Countering Adversarial Images using Input Transformations 2017, 733
Preliminary work: Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks 2019, 57
Efficiently compute perturbations that fool deep networks DeepFool: a simple and accurate method to fool deep neural networks 2015, 2870
Training objective is inspired from center loss, which clusters penultimate layer features. A Discriminative Feature Learning Approach for Deep Face Recognition 2016, 2542
Think
Limit affine
Firstly, considering a affine function. When the matrix exists a very large value, I assume it's , so has a little perturbation that is going to have a big change.
If we limit the maximum of , whether we can defense adversarial attack?
Random disturbance - disrupting disturbance distribution
According to some studies, it shows that it doesn't work well even if the epsilon is very large. so I think it may be related to disturbance distribution, the experiment has verified it, which gets no bad result by disrupting disturbance distribution.
But the method is too weak to be cracked as it is reversible, so whether or not there is a method that can disrupt disturbance distribution while it is irreversible.
New idea:
We can make three verification, we use the primal image, the left shifted image and the right shifted image. Then we get three answers.
There are three scenarios:
- Three answers are the same, we think is a normal image.
- Two answers are the same value a2, but the last one are not, we think is a adversarial image and the right class is a2.
- Two answers are the same value a2, but the last one are not, we think is a adversarial image and the right class isn't a2.
- There is no the same answer, we think it is a image without meaning.
Pro:
- detecting adversarial example
- It can get right class even if there is adversarial disturbance
- It's irreversible and simple.
We need transformation :
Perturbation proof
Expand
For convenience, we set
Conclusion: It shows that there is nothing related to , and why?
Consider a simple situation.
Conclusion
Models with adversarial attacks defense improve robustness.
Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset. [Explaining and Harnessing Adversarial Examples, Goodfellow, 2015]