Documente Academic
Documente Profesional
Documente Cultură
Abstract
arXiv:1511.04599v3 [cs.LG] 4 Jul 2016
1. Introduction
Deep neural networks are powerful learning models that
achieve state-of-the-art pattern recognition performance in
many research areas such as bioinformatics [1, 16], speech
[12, 6], and computer vision [10, 8]. Though deep net-
works have exhibited very good performance in classifica-
tion tasks, they have recently been shown to be particularly
unstable to adversarial perturbations of the data [18]. In
fact, very small and often imperceptible perturbations of the
data samples are sufficient to fool state-of-the-art classifiers
and result in incorrect classification. (e.g., Figure 1). For-
mally, for a given classifier, we define an adversarial per-
Figure 1: An example of adversarial perturbations.
turbation as the minimal perturbation r that is sufficient to
First row: the original image x that is classified as
change the estimated label k̂(x):
k̂(x)=“whale”. Second row: the image x + r classified
∆(x; k̂) := min krk2 subject to k̂(x + r) 6= k̂(x), (1) as k̂(x + r)=“turtle” and the corresponding perturbation r
r computed by DeepFool. Third row: the image classified
as “turtle” and the corresponding perturbation computed
where x is an image and k̂(x) is the estimated label. We by the fast gradient sign method [4]. DeepFool leads to a
call ∆(x; k̂) the robustness of k̂ at point x. The robustness smaller perturbation.
of classifier k̂ is then defined as
1 To encourage reproducible research, the code of DeepFool is made
available at http://github.com/lts4/deepfool
scale to large datasets. In [14], the authors showed that con-
∆(x; k̂) volutional networks are not invariant to some sort of trans-
ρadv (k̂) = Ex , (2) formations based on the experiments done on Pascal3D+
kxk2
annotations. Recently, Tsai et al. [19] provided a software
where Ex is the expectation over the distribution of data. to misclassify a given image in a specified class, without
The study of adversarial perturbations helps us understand necessarily finding the smallest perturbation. Nguyen et al.
what features are used by a classifier. The existence of such [13] generated synthetic unrecognizable images, which are
examples is seemingly in contradiction with the generaliza- classified with high confidence. The authors of [3] also
tion ability of the learning algorithms. While deep networks studied a related problem of finding the minimal geomet-
achieve state-of-the-art performance in image classification ric transformation that fools image classifiers, and provided
tasks, they are not robust at all to small adversarial pertur- quantitative measure of the robustness of classifiers to geo-
bations and tend to misclassify minimally perturbed data metric transformations. Closer to our work, the authors of
that looks visually similar to clean samples. Though adver- [4] introduced the “fast gradient sign” method, which com-
sarial attacks are specific to the classifier, it seems that the putes the adversarial perturbations for a given classifier very
adversarial perturbations are generalizable across different efficiently. Despite its efficiency, this method provides only
models [18]. This can actually become a real concern from a coarse approximation of the optimal perturbation vectors.
a security point of view. In fact, it performs a unique gradient step, which often leads
An accurate method for finding the adversarial perturba- to sub-optimal solutions. Then in an attempt to build more
tions is thus necessary to study and compare the robustness robust classifiers to adversarial perturbations, [5] introduced
of different classifiers to adversarial perturbations. It might a smoothness penalty in the training procedure that allows
be the key to a better understanding of the limits of cur- to boost the robustness of the classifier. Notably, the method
rent architectures and to design methods to increase robust- in [18] was applied in order to generate adversarial pertur-
ness. Despite the importance of the vulnerability of state-of- bations. We should finally mention that the phenomenon of
the-art classifiers to adversarial instability, no well-founded adversarial instability also led to theoretical work in [2] that
method has been proposed to compute adversarial perturba- studied the problem of adversarial perturbations on some
tions and we fill this gap in this paper. families of classifiers, and provided upper bounds on the
Our main contributions are the following: robustness of these classifiers. A deeper understanding of
the phenomenon of adversarial instability for more complex
• We propose a simple yet accurate method for comput- classifiers is however needed; the method proposed in this
ing and comparing the robustness of different classi- work can be seen as a baseline to efficiently and accurately
fiers to adversarial perturbations. generate adversarial perturbations in order to better under-
stand this phenomenon.
• We perform an extensive experimental comparison, The rest of paper is organized as follows. In Section 2,
and show that 1) our method computes adversarial per- we introduce an efficient algorithm to find adversarial per-
turbations more reliably and efficiently than existing turbations in a binary classifier. The extension to the mul-
methods 2) augmenting training data with adversarial ticlass problem is provided in Section 3. In Section 4, we
examples significantly increases the robustness to ad- propose extensive experiments that confirm the accuracy of
versarial perturbations. our method and outline its benefits in building more robust
classifiers.
• We show that using imprecise approaches for the com-
putation of adversarial perturbations could lead to dif-
ferent and sometimes misleading conclusions about the
robustness. Hence, our method provides a better un-
2. DeepFool for binary classifiers
derstanding of this intriguing phenomenon and of its
As a multiclass classifier can be viewed as aggregation of
influence factors.
binary classifiers, we first propose the algorithm for binary
We now review some of the relevant work. The phe- classifiers. That is, we assume here k̂(x) = sign(f (x)),
nomenon of adversarial instability was first introduced and where f is an arbitrary scalar-valued image classification
studied in [18]. The authors estimated adversarial examples function f : Rn → R. We also denote by F , {x :
by solving penalized optimization problems and presented f (x) = 0} the level set at zero of f . We begin by analyzing
an analysis showing that the high complexity of neural net- the case where f is an affine classifier f (x) = wT x + b,
works might be a reason explaining the presence of adver- and then derive the general algorithm, which can be applied
sarial examples. Unfortunately, the optimization method to any differentiable binary classifier f .
employed in [18] is time-consuming and therefore does not In the case where the classifier f is affine, it can easily
f (x) > 0 Algorithm 1 DeepFool for binary classifiers
1: input: Image x, classifier f .
2: output: Perturbation r̂.
3: Initialize x0 ← x, i ← 0.
4: while sign(f (xi )) = sign(f (x0 )) do
f (x) < 0 r∗ (x) f (xi )
∆(
5: ri ← − k∇f (xi )k22
∇f (xi ),
x0
6: xi+1 ← xi + ri ,
;f
)
F 7: i ← i + 1.
x0
8: end while P
Figure 2: Adversarial examples for a linear binary classifier. 9: return r̂ = i ri .
x0
x0
F2
F2
Table 1: The adversarial robustness of different classifiers on different datasets. The time required to compute one sample
for each method is given in the time columns. The times are computed on a Mid-2015 MacBook Pro without CUDA support.
The asterisk marks determines the values computed using a GTX 750 Ti GPU.
the complexity aspect, the proposed approach is substan- Classifier DeepFool Fast gradient sign
tially faster than the standard method proposed in [18]. In LeNet (MNIST) 0.10 0.26
fact, while the approach [18] involves a costly minimization
FC500-150-10 (MNIST) 0.04 0.11
of a series of objective functions, we observed empirically
that DeepFool converges in a few iterations (i.e., less than NIN (CIFAR-10) 0.008 0.024
3) to a perturbation vector that fools the classifier. Hence, LeNet (CIFAR-10) 0.015 0.028
the proposed approach reaches a more accurate perturba-
tion vector compared to state-of-the-art methods, while be-
ing computationally efficient. This makes it readily suitable Table 2: Values of ρ̂∞
adv for four different networks based on
to be used as a baseline method to estimate the robustness DeepFool (smallest l∞ perturbation) and fast gradient sign
of very deep neural networks on large-scale datasets. In that method with 90% of misclassification.
context, we provide the first quantitative evaluation of the
robustness of state-of-the-art classifiers on the large-scale
ImageNet dataset. It can be seen that despite their very good
test accuracy, these methods are extremely unstable to ad- and CIFAR-10 tasks. Specifically, for each network, we
versarial perturbations: a perturbation that is 1000 smaller performed two experiments: (i) Fine-tuning the network on
in magnitude than the original image is sufficient to fool DeepFool’s adversarial examples, (ii) Fine-tuning the net-
state-of-the-art deep neural networks. work on the fast gradient sign adversarial examples. We
We illustrate in Figure 1 perturbed images generated by fine-tune the networks by performing 5 additional epochs,
the fast gradient sign and DeepFool. It can be observed with a 50% decreased learning rate only on the perturbed
that the proposed method generates adversarial perturba- training set. For each experiment, the same training data
tions which are hardly perceptible, while the fast gradient was used through all 5 extra epochs. For the sake of com-
sign method outputs a perturbation image with higher norm. pleteness, we also performed 5 extra epochs on the origi-
nal data. The evolution of ρ̂adv for the different fine-tuning
It should be noted that, when perturbations are mea- strategies is shown in Figures 6a to 6d, where the robust-
sured using the `∞ norm, the above conclusions remain un- ness ρ̂adv is estimated using DeepFool, since this is the most
changed: DeepFool yields adversarial perturbations that are accurate method, as shown in Table 1. Observe that fine-
smaller (hence closer to the optimum) compared to other tuning with DeepFool adversarial examples significantly in-
methods for computing adversarial examples. Table 2 re- creases the robustness of the networks to adversarial pertur-
ports the `∞ robustness to adversarial perturbations mea- bations even after one extra epoch. For example, the ro-
kr̂(x)k∞
sured by ρ̂∞ 1
P
adv (f ) = |D| x∈D kxk∞ , where r̂(x) is bustness of the networks on MNIST is improved by 50%
computed respectively using DeepFool (with p = ∞, see and NIN’s robustness is increased by about 40%. On the
Section 3.3), and the Fast gradient sign method for MNIST other hand, quite surprisingly, the method in [4] can lead
and CIFAR-10 tasks. to a decreased robustness to adversarial perturbations of
Fine-tuning using adversarial examples In this sec- the network. We hypothesize that this behavior is due to
tion, we fine-tune the networks of Table 1 on adversarial the fact that perturbations estimated using the fast gradient
examples to build more robust classifiers for the MNIST sign method are much larger than minimal adversarial per-
0.28 0.16
0.15
0.26
0.14
0.24
0.13
0.22
0.12
ρ̂adv
ρ̂adv
0.2 0.11
0.1
0.18 DeepFool DeepFool
Fast gradient sign Fast gradient sign
Clean 0.09 Clean
0.16
0.08
0.14
0.07
0.12 0.06
0 1 2 3 4 5 0 1 2 3 4 5
Number of extra epochs Number of extra epochs
(a) Effect of fine-tuning on adversarial examples com- (b) Effect of fine-tuning on adversarial examples com-
puted by two different methods for LeNet on MNIST. puted by two different methods for a fully-connected
network on MNIST.
0.042 0.04
0.04
0.038 DeepFool
Fast gradient sign
0.036 Clean
0.035
0.034
ρ̂adv
ρ̂adv
0.032
0.03
DeepFool
Fast gradient sign 0.03
0.028 Clean
0.026
0.024
0.022 0.025
0 1 2 3 4 5 0 1 2 3 4 5
Number of extra epochs Number of extra epochs
(c) Effect of fine-tuning on adversarial examples com- (d) Effect of fine-tuning on adversarial examples com-
puted by two different methods for NIN on CIFAR-10. puted by two different methods for LeNet on CIFAR-10.
Figure 6
0.3
turbations. Fine-tuning the network with overly perturbed
images decreases the robustness of the networks to adver-
0.25
sarial perturbations. To verify this hypothesis, we com-
pare in Figure 7 the adversarial robustness of a network that
0.2
is fine-tuned with the adversarial examples obtained using
ρ̂adv
1.4
DeepFool
Fast gradient sign
1.3
Normalized robustness
Figure 8: From “1” to “7” : original image classified as “1” 1.2
0.9
Classifier DeepFool Fast gradient sign Clean
LeNet (MNIST) 0.8% 4.4% 1% 0.8
0 1 2 3 4 5
Number of extra epochs
FC500-150-10 (MNIST) 1.5% 4.9% 1.7%
NIN (CIFAR-10) 11.2% 21.2% 11.5% Figure 9: How the adversarial robustness is judged by dif-
ferent methods. The values are normalized by the corre-
LeNet (CIFAR-10) 20.0% 28.6% 22.6%
sponding ρ̂adv s of the original network.