Adversarial Robustness Curves

Göpfert, Christina; Göpfert, Jan Philip; Hammer, Barbara

doi:10.1007/978-3-030-43823-4_15

Christina Göpfert⁸,
Jan Philip Göpfert⁸ &
Barbara Hammer⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1167))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2284 Accesses
4 Citations

Abstract

The existence of adversarial examples has led to considerable uncertainty regarding the trust one can justifiably put in predictions produced by automated systems. This uncertainty has, in turn, lead to considerable research effort in understanding adversarial robustness. In this work, we take first steps towards separating robustness analysis from the choice of robustness threshold and norm. We propose robustness curves as a more general view of the robustness behavior of a model and investigate under which circumstances they can qualitatively depend on the chosen norm.

C. Göpfert and J. P. Göpfert—Equal contribution.

You have full access to this open access chapter, Download conference paper PDF

Beyond generalization: a theory of robustness in machine learning

Article Open access 27 September 2023

How to Compare Adversarial Robustness of Classifiers from a Global Perspective

Adversarial Evasion on LLMs

1 Introduction

Robustness of machine learning models has recently attracted massive research interest. This interest is particularly pronounced in the context of deep learning. On the one hand, this is due to the massive success and widespread deployment of deep learning. On the other hand, it is due to the intriguing properties that can be demonstrated for deep learning (although these are not unique to this setting): the circumstance that deep learning can produce models that achieve or surpass human-level performance in a wide variety of tasks, but completely disagree with human judgment after application of imperceptible perturbations [13]. The ability of a classifier to maintain its performance under such changes to the input data is commonly referred to as robustness to adversarial perturbations.

In order to better understand adversarial robustness, recent years have seen the development of a host of methods that produce adversarial examples, in the white box and black box settings, with specific or arbitrary target labels, and varying additional constraints [3, 7, 8, 11, 12]. There has also been a push towards training regimes that produce adversarially robust networks, such as data augmentation with adversarial examples or distillation [1, 4, 6, 10]. The difficulty faced by such approaches is that robustness is difficult to measure and quantify: even if a model is shown to be robust against current state of the art attacks, this does not exclude the possibility that newly devised attacks may be successful [2]. The complexity of deep learning models and counter-intuitive nature of some phenomena surrounding adversarial examples further make it challenging to understand the impact of robust training or the properties that determine whether a model is robust or non-robust. Recent work has highlighted settings where no model can be simultaneously accurate and robust [14], or where finding a model that is simultaneously robust and accurate requires optimizing over a different hypothesis class than finding one that is simply accurate [9]. These examples rely on linear models, as they are easy for humans to understand. They analyze robustness properties for a fixed choice of norm and, typically, a fixed disadvantageous perturbation size (dependent on the model). This raises the question: “How do the presented results depend on the choice of norm, choice of perturbation size, and choice of linear classifier as a hypothesis class?”

In this contribution, we:

propose robustness curves as a way of better representing adversarial robustness in place of “point-wise” measures,
show that linear classifiers are not sufficient to illustrate all interesting robustness phenomena, and
investigate how robustness curves may depend on the choice of norm.

2 Definitions

In the following, we assume data $(x,y) \in X \times Y$, $X \subseteq \mathbb {R}^d$, are generated i.i.d. according to distribution P with marginal $P_X$. Let $f:X \rightarrow Y$ denote some classifier and let $x \in X$. The standard loss of f on P is

$$\begin{aligned} L(f) := P(\{(x,y) : f(x) \ne y\}). \end{aligned}$$

(1)

Let $n:X \rightarrow \mathbb {R}^+$ be some norm, let $\varepsilon \ge 0$ and let

$$\begin{aligned} B_n(x, \varepsilon ) := \{x' : n(x - x') \le \varepsilon \}. \end{aligned}$$

(2)

Following [14], we define the $\varepsilon $-adversarial loss of f regarding P and n as

$$\begin{aligned} L_{n,\varepsilon }(f) := P(\underbrace{\{(x,y) : \exists x' \in B_n(x, \varepsilon ): f(x') \ne y \}}_{=:A_\varepsilon ^n}). \end{aligned}$$

(3)

We have $L_{n,0}(f) = L(f)$. Alternatively, we can exclude from this definition any points that are initially misclassified by the model, and instead consider as adversarial examples all points where the model changes its behavior under small perturbations. Then the $\varepsilon $-margin loss is defined as

$$\begin{aligned} L'_{n,\varepsilon }(f) := P_X(\{x: \exists x' \in B_n(x, \varepsilon ): f(x') \ne f(x) \}). \end{aligned}$$

(4)

$L'_{n,\varepsilon }$ is the weight of all points within an $\varepsilon $-margin of a decision boundary. We have $L'_{n,0}(f) = 0$.

There are two somewhat arbitrary choices in the definition in Eqs. (3) and (4): the choice of $\varepsilon $ and the choice of the norm n. The aim of this contribution is to investigate how $\varepsilon $ and n impact the adversarial robustness.

3 Robustness Curves

As a first step towards understanding robustness globally, instead of for an isolated perturbation size $\varepsilon $, we propose to view robustness as a function of $\varepsilon $. This yields an easy-to-understand visual representation of adversarial robustness in the form of a robustness curve.

Definition 1

The robustness curve of a classifier f, given a norm n and underlying distribution P, is the curve defined by

$$\begin{aligned} r_{f,n,P} : [0, \infty )&\rightarrow [0,1] \end{aligned}$$

(5)

$$\begin{aligned} \varepsilon&\mapsto L_{n, \varepsilon }(f). \end{aligned}$$

(6)

The margin curve of f given n and P is the curve defined by

$$\begin{aligned} r'_{f,n,P} : [0, \infty )&\rightarrow [0,1] \end{aligned}$$

(7)

$$\begin{aligned} \varepsilon&\mapsto L'_{n, \varepsilon }(f). \end{aligned}$$

(8)

Commonly chosen norms for the investigation of adversarial robustness are the $\ell _1$ norm (denoted by $\Vert \cdot \Vert _1$), the $\ell _2$ norm (denoted by $\Vert \cdot \Vert _2$), and the $\ell _\infty $ norm (denoted by $\Vert \cdot \Vert _\infty $). In the following, we will investigate robustness curves for these three choices of n.

[14] propose a distribution $P_1$ where $y\,{\mathop {\sim }\limits ^{\text {u. a. r.}}}\,\{-1, +1\}$ and

$$\begin{aligned} x_1 = {\left\{ \begin{array}{ll} 1 &{} \text {w. p.} \, p \\ -1 &{} \text {w. p.} \, (1-p) \end{array}\right. } \quad x_2, \dots , x_{d+1} {\mathop {\sim }\limits ^{\text {i. i. d.}}} \mathcal {N}(\eta y, 1). \end{aligned}$$

(9)

For this distribution, they show that the linear classifier $f_{\mathrm {avg}}(x) = \text {sign}(w^T x)$ with $w = (0, 1/d, \dots , 1/d)$ has high accuracy, but low $\varepsilon $-robustness in $\ell _\infty $ norm for $\varepsilon \ge 2 \eta $, while the classifier $f_{\mathrm {rob}}(x) = \text {sign}(w^Tx)$ with $w = (1, 0, \dots , 0)$ has high $\varepsilon $-robustness for $\varepsilon < 1$, but low accuracy. [9] proposes a distribution $P_2$ where $y\,{\mathop {\sim }\limits ^{\text {u. a. r.}}}\,\{-1, +1\}$ and

$$\begin{aligned} x_i = {\left\{ \begin{array}{ll} y &{} \text {w. p.} \, 0.51 \\ -y &{} \text {w. p.} \, 0.49 \end{array}\right. } \end{aligned}$$

(10)

where the linear classifier $f_{s}(x) = \text {sign}(w^T x)$ with $w = \varvec{1}_d$ has high accuracy, but low $\varepsilon $-robustness in $\ell _\infty $ norm for $\varepsilon \ge \frac{1}{2}$. Figure 1 shows margin curves and robustness curves for $P_1$ and $f_{\mathrm {avg}}$, $P_1$ and $f_{\mathrm {rob}}$ and $P_2$ and $f_s$.

4 The Impact of n

The curves shown in Fig. 1 seem to behave similarly for each norm. Is this always the case? Indeed, if f is a linear classifier parameterized by normal vector w and offset b, denote by

$$\begin{aligned} d_n((w, b), x) = \min \{n(v): \exists p: x = p + v, \langle w , p \rangle + b = 0\} \end{aligned}$$

(11)

the shortest distance between (w, b) and x in norm n. Then a series of algebraic manipulations yield

$$\begin{aligned} d_{\Vert \cdot \Vert _1}((w,b), x)&= \frac{|b + \langle w, x \rangle |}{\Vert w\Vert _\infty },\end{aligned}$$

(12)

$$\begin{aligned} d_{\Vert \cdot \Vert _2}((w,b), x)&= \frac{|b + \langle w, x \rangle |}{\Vert w\Vert _2}, \end{aligned}$$

(13)

$$\begin{aligned} d_{\Vert \cdot \Vert _\infty }((w,b), x)&= \frac{|b + \langle w, x \rangle |}{\Vert w\Vert _1}. \end{aligned}$$

(14)

In particular, there exist constants c and $c'$ depending on (w, b) such that for all $x \in X$,

$$\begin{aligned} d_{\Vert \cdot \Vert _1}((w,b), x) = c d_{\Vert \cdot \Vert _2}((w,b), x) = c' d_{\Vert \cdot \Vert _\infty }((w,b), x) \end{aligned}$$

(15)

This implies the following Theorem:

Theorem 1

For any linear classifier f, there exist constants $c, c' > 0$ such that for any $\varepsilon \ge 0$,

$$\begin{aligned} L_{\Vert \cdot \Vert _1, \varepsilon }(f) = L_{\Vert \cdot \Vert _2, \varepsilon / c}(f) = L_{\Vert \cdot \Vert _\infty , \varepsilon / c'}(f). \end{aligned}$$

(16)

As a consequence, for linear classifiers, dependence of robustness curves on the choice of norm is purely a matter of compression and elongation.

What can we say about classifiers with more complex decision boundaries? For all x, we have

$$\begin{aligned} \Vert x\Vert _\infty \le \Vert x\Vert _2 \le \Vert x\Vert _1 \le \sqrt{d} \Vert x\Vert _2 \le d \Vert x\Vert _\infty . \end{aligned}$$

(17)

These inequalities are tight, i.e. there for each inequality there exists some x such that equality holds. It follows that, for any $\varepsilon > 0$,

$$\begin{aligned} A_{\varepsilon /d}^{\Vert \cdot \Vert _\infty } \subseteq A_{\varepsilon / \sqrt{d}}^{ \Vert \cdot \Vert _2} \subseteq A_\varepsilon ^{\Vert \cdot \Vert _1} \subseteq A_{\varepsilon }^{\Vert \cdot \Vert _2} \subseteq A_{\varepsilon }^{\Vert \cdot \Vert _\infty } \end{aligned}$$

(18)

and so

$$\begin{aligned} L_{\Vert \cdot \Vert _\infty , \varepsilon }(f)&\ge L_{\Vert \cdot \Vert _2, \varepsilon }(f) \ge L_{\Vert \cdot \Vert _1, \varepsilon }(f) \end{aligned}$$

(19)

$$\begin{aligned}&\ge L_{\Vert \cdot \Vert _2, \varepsilon / \sqrt{d}}(f) \ge L_{\Vert \cdot \Vert _1, \varepsilon /d}(f). \end{aligned}$$

(20)

In particular, the robustness curve for the $\ell _\infty $-norm is always an upper bound for the robustness curve for any other $\ell _p$-norm (since $\Vert x\Vert _p \le \Vert x\Vert _\infty $ for all x and $p \ge 1$). Thus, for linear classifiers as well as classifiers with more complicated decision boundaries, in order to show that a model is adversarially robust for any fixed norm, it is sufficient to show that it exhibits the desired robustness behavior for the $\ell _\infty $-norm. On the other hand, in order to show that a model is not adversarially robust, showing this for the $\ell _\infty $ norm does not necessarily imply the same qualities in another norm, as the robustness curves may be strongly separated in high-dimensional spaces, both for linear and non-linear models.

Contrary to linear models, for more complicated decision boundaries, robustness curves may also exhibit qualitatively different behavior. This is illustrated in Fig. 2. The decision boundary in each case is given by a quadratic model in 2-dimensional space: $f(\varvec{x}) = \text {sign}(x_1^2 - x_2)$. In the first example, we construct a finite set of points, all at $\ell _2$-distance 1 from the decision boundary, but at various $\ell _1$ and $\ell _\infty $ distances. For any distribution concentrated on a set of such points, the $\ell _2$-robustness curve jumps from zero to one at a single threshold value, while the $\ell _1$- and $\ell _\infty $-robustness curves are step functions with the height of the steps determined by the distribution across the points and the width determined by the variation in $\ell _1$ or $\ell _\infty $ distances from the decision boundary. The robustness curves in this example also exhibit, at some points, the maximal possible separation by a factor of $\sqrt{d}$ (note that $d=2$) while touching in other points. In the second example, we show a continuous version of the same phenomenon, with points inside and outside the parabola distributed at constant $\ell _2$-distance from the decision boundary, but with varying $\ell _1$ and $\ell _\infty $ distances. As a result, the robustness curves for different norms are qualitatively different. The third example, on the other hand, shows a setting where the robustness curves for the three norms are both quantitatively and qualitatively similar.

These examples drive home two points:

The robustness properties of a classifier may depend both quantitatively and qualitatively on the norm chosen to measure said robustness. When investigating robustness, it is therefore imperative to consider which norm, or, more broadly, which concept of closeness best represents the type of perturbation to guard against.
Linear classifiers are not a sufficient tool for understanding adversarial robustness in general, as they in effect neutralize a degree of freedom given by the choice of norm.

5 Discussion

We have proposed robustness curves as a more general perspective on the robustness properties of a classifier and have discussed how these curves can or cannot be affected by the choice of norm. Robustness curves are a tool for a more principled investigation of adversarial robustness, while their dependence on a chosen norm underscores the necessity of basing robustness analyses on a clear problem definition that specifies what kind of perturbations a model should be robust to. We note that the use of $\ell _p$ norms in current research is frequently meant only as an approximation of a “human perception distance” [5]. A human’s ability to detect a perturbation depends on the point the perturbation is applied to, meaning that human perception distance is not a homogeneous metric, and thus not induced by a norm. In this sense, where adversarial robustness is meant to describe how faithful the behavior of a model matches that of a human, the adversarial loss in Eq. (3) can only be seen as a starting point of analysis. Nonetheless, since perturbations with small $\ell _p$-norm are frequently imperceptible to humans, adversarial robustness regarding some $\ell _p$-norm is a reasonable lower bound for adversarial robustness in human perception distance. In future work, we would like to investigate how robustness curves can be estimated for deep networks and extend the definition to robustness against targeted attacks.

References

Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A., Criminisi, A.: Measuring neural net robustness with constraints (2016)
Google Scholar
Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57 (2017). https://doi.org/10.1109/sp.2017.49
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2014)
Google Scholar
Gu, S., Rigazio, L.: Towards deep neural network architectures robust to adversarial examples (2014)
Google Scholar
Göpfert, J.P., Wersing, H., Hammer, B.: Adversarial attacks hidden in plain sight (2019)
Google Scholar
Huang, R., Xu, B., Schuurmans, D., Szepesvari, C.: Learning with a strong adversary (2015)
Google Scholar
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world (2016)
Google Scholar
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial machine learning at scale (2016)
Google Scholar
Nakkiran, P.: Adversarial robustness may be at odds with simplicity (2019)
Google Scholar
Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), May 2016. https://doi.org/10.1109/sp.2016.41
Papernot, N., McDaniel, P.D., Goodfellow, I.J., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against deep learning systems using adversarial examples (2016)
Google Scholar
Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks (2017). https://doi.org/10.1109/tevc.2019.2890858
Szegedy, C., et al.: Intriguing properties of neural networks (2014)
Google Scholar
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=SyxAb30cY7

Download references

Author information

Authors and Affiliations

Bielefeld University, Bielefeld, Germany
Christina Göpfert, Jan Philip Göpfert & Barbara Hammer

Authors

Christina Göpfert
View author publications
You can also search for this author in PubMed Google Scholar
Jan Philip Göpfert
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Hammer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Philip Göpfert .

Editor information

Editors and Affiliations

Institut National des Sciences Appliquées, Rennes, France
Peggy Cellier
Maastricht University, Maastricht, The Netherlands
Kurt Driessens

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Göpfert, C., Göpfert, J.P., Hammer, B. (2020). Adversarial Robustness Curves. In: Cellier, P., Driessens, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1167. Springer, Cham. https://doi.org/10.1007/978-3-030-43823-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-43823-4_15
Published: 28 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43822-7
Online ISBN: 978-3-030-43823-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)