* cpu_supplement/.cvsignore, cpu_supplement/Makefile.am, cpu_supplement/arm.t, cpu_supplement/bfin.t, cpu_supplement/cpu_supplement.texi, cpu_supplement/i386.t, cpu_supplement/m68k.t, cpu_supplement/mips.t, cpu_supplement/powerpc.t, cpu_supplement/preface.texi, cpu_supplement/sh.t: Remove duplicated text from each CPU specific chapter. This text was necessary when each CPU was a separate manual but now only needs to be one place and that is in an introductory chapter. * cpu_supplement/general.t: New file.
Formal verification is only as good as the specification of a system, which is also true for neural network verification. Existing specifications follow the paradigm of data as specification, where the local neighborhood around a reference data point is considered correct or robust. While these specifications provide a fair testbed for assessing model robustness, they are too restrictive for verifying unseen test data—a challenging task with significant real-world implications. Recent work shows great promise through a new paradigm, neural representation as specification, which uses neural activation patterns (NAPs) for this purpose. However, it computes the most refined NAPs, which include many redundant neurons. In this paper, we study the following problem: Given a neural network, find a minimal (general) NAP specification that is sufficient for formal verification of the network’s robustness. Finding the minimal NAP specification not only expands verifiable bounds but also provides insights into which neurons contribute to the model’s robustness. To address this problem, we propose several exact and approximate approaches. Our exact approaches leverage the verification tool to find minimal NAP specifications in either a deterministic or statistical manner. Whereas the approximate methods efficiently estimate minimal NAPs using adversarial examples and local gradients, without making calls to the verification tool. This allows us to inspect potential causal links between neurons and the robustness of state-of-the art neural networks, a task for which existing verification frameworks fail to scale. Our experimental results suggest that minimal NAP specifications require much smaller fractions of neurons compared to the most refined NAP specifications computed by previous work, yet they can significantly expand the verifiable boundaries to several orders of magnitude larger.
You can find more information on our web, so please take a look.
Do, Not, Us, This, Code, Put, the, Correct, Terms, for, Your, Paper ††copyright: acmlicensed††journalyear: ††doi: XXXXXXX.XXXXXXX††conference: Make sure to enter the correct conference title from your rights confirmation emai; June 03–05, ; Woodstock, NY††booktitle: Woodstock ’18: ACM Symposium on Neural Gaze Detection, June 03–05, , Woodstock, NY††isbn: 978-1--XXXX-X/18/06††ccs: Do Not Use This Code Generate the Correct Terms for Your Paper††ccs: Do Not Use This Code Generate the Correct Terms for Your Paper††ccs: Do Not Use This Code Generate the Correct Terms for Your Paper††ccs: Do Not Use This Code Generate the Correct Terms for Your PaperThe growing prevalence of deep learning systems in decision-critical applications has elevated safety concerns regarding AI systems, such as their vulnerability to adversarial attacks (Goodfellow et al., ; Dietterich and Horvitz, ). Therefore, the verification of AI systems has become increasingly important and attracted much attention from the research community. The field of neural network verification largely follows the paradigm of software verification – using formal methods to verify desirable properties of systems through rigorous mathematical specifications and proofs (Wing, ). Nearly all existing works (Katz et al., , ; Huang et al., a, ; Wang et al., b) follow a “data as specification” paradigm, which uses the consistency of local neighborhoods (often L∞subscript????L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT balls) of reference data points as the specification.
While local neighborhood specification provides a fair and effective testbed for evaluating neural networks’ robustness, it has two major limitations: 1) it primarily covers a convex region of input data, which can be mathematically described by adding noise to the reference point, as illustrated in Figure 1b; 2) it is too restrictive to cover unseen test set data, which are real data sampled from the underlying distribution. For instance, the maximum L∞subscript????L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT verifiable bounds used in VNNCOMP (Brix et al., ) – the annual neural network verification competition – are usually less than 0.2, while the smallest distance between data points with the same label exceeds 0.5. Indeed, due to the nature of image distribution, the distances between real images are far beyond L∞subscript????L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT verifiable bounds. As a result, local neighborhood specifications are not suitable for the verification of unseen test data—a challenging task with significant real-world implications.
To make neural verification practically useful, the desired specification must cover data from the same class, even if they are distributed non-linearly and non-convexly in the input space. Unlike local neighborhood specifications, manually defining such a specification is almost impractical. This poses a tricky chicken-egg problem: machine learning (ML) is necessary because it’s challenging to formally write down a precise definition (aka specification); but to be able to verify machine learning models, a formal specification would be needed. We argue that, in order to tackle this challenge, a separate learning algorithm for specifications is necessary. In this view, the “data as specification” paradigm is a simple but extremely overfitted algorithm for specification learning, which simply picks a small neighborhood of a reference data point in the input space. To this end, Geng et al. () proposes using a a new and more promising paradigm - ”neural representation as specification”, which learns a specification in the representation space of the trained machine learning model in the form of neural activation patterns (NAPs). NAPs are abstractions of the values of hidden neurons which have been shown useful for understanding the decision-making process of a model when making a prediction (Gopinath et al., ). Most importantly, a successful neural network would exhibit similar activation patterns for input data from the same class, regardless of their actual distance in the input space (Bengio et al., ; Tishby and Zaslavsky, ; Geng et al., ). This key observation suggests that if we learn a NAP—a common activation pattern shared by a certain class of data—it can be used as a specification for verifying data from that class. Once such a NAP specification is successfully verified, we say any data covered by this NAP (exhibiting this pattern) provably belongs to the corresponding class. Ideally, if that NAP covers all data from a class, it can be considered a machine-checkable definition of that class. Geometrically speaking, compared to the L∞subscript????L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ball specifications, NAP specifications could cover much more flexible and larger regions, as illustrated in Figure 1.
However, it is noteworthy that the current approach to computing NAPs relies on a simple statistical method that assumes each neuron contributes to certifying the robustness of neural networks. Consequently, the computed NAPs are often overly refined. This is a restrictive assumption, as many studies (Frankle and Carbin, ; Lu et al., ; Liang et al., ; Wang et al., a) have revealed that a significant portion of neurons in neural networks may not play a substantial role. In the spirit of Occam’s Razor, we aim to systematically remove these redundant neurons that do not affect robustness. This motivates us to address the following challenge: given a neural network, find a minimal (coarsest) NAP that is sufficient for verifying the network’s robustness. This problem is important for the following reasons: i) Minimal NAP specifications cover potentially much larger regions in the input space compared to the most refined ones, increasing the ability to verify more unseen data; ii) Minimal NAP specifications provide insight into which neurons are responsible for the model making robust predictions, helping us to reveal the black-box nature of neural networks. For instance, if we aim to decode NAPs into human-understandable programs or rules (Li et al., ), minimal NAPs will always be easier to decipher than the most refined NAPs. We leave interpreting individual neurons as future work.
To find the minimal NAP specifications, we first introduce two basic algorithms: Refine and Coarsen. These algorithms exhaustively check all possible candidates using off-the-shelf verification tools, such as Marabou (Katz et al., ). For instance, Coarsen gradually coarsens each neuron of the most refined NAPs and retains only the coarsened neurons when Marabou returns a verification success. While these approaches provide correctness guarantees, they are not efficient for verifying large neural networks, as calls to verification tools are typically expensive. To improve efficiency, we further propose statistical variants of Refine and Coarsen — Sample_Refine and Sample_Coarsen that leverage sampling and statistical learning principles to find the minimal NAP specification.
However, verification-dependent approaches struggle to scale up to state-of-the-art neural network models due to limitations in current verification tools. This motivates us to explore estimation methods that are independent of verification tools. In our exploration, we discover that adversarial examples and local gradients offer valuable insights into mandatory neurons — the essential building blocks of minimal NAPs. Based on these insights, we develop two approximate approaches: Adversarial_Prune and Gradient_Search. Our experimental results demonstrate that these estimation methods produce fairly accurate estimates. Additionally, we apply these methods to state-of-the-art neural networks such as VGG-19 (Simonyan and Zisserman, ). Although we cannot formally verify the correctness of these estimated minimal NAP specifications due to the scalability limitations of current verification techniques, we demonstrate that these NAPs capture important hidden features and concepts learned by the model. As many studies indicate, visual interpretability and robustness are inherently related and observed in learned features and representations (Alvarez Melis and Jaakkola, ; Boopathy et al., ; Dong et al., ). Therefore, we believe that the estimated mandatory neurons can account for the model’s robustness and their activation states can also serve as empirical certifaces of a confident prediction.
Moreover, previous research has suggested that NAP specifications cover larger regions than L∞subscript????L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ball specifications, but lacks empirical evidence. In this paper, we fill this gap by introducing a simple and efficient method to estimate the volume of certified regions corresponding to NAP specifications. Additionally, this method offers a rough approximation of the volumetric change from refined NAP specifications to minimal NAP specifications. Our contributions can be summarized as follows:
We propose the problem of learning the minimal NAP specification, emphasizing the necessity for a new paradigm of specifying neural networks. We present two simple verification-based approaches with correctness guarantees, alongside more efficient statistical methods.
We introduce key concepts including the abstraction function, NAP specification, and mandatory neurons. We show that the problem is equivalent to identifying all mandatory neurons and propose two efficient estimation approaches to learning them.
Our experiments indicate that the minimal NAP specifications involve significantly fewer neurons compared to the most refined NAPs computed by the baseline approach. Moreover, they expand the verifiable bound by several orders of magnitude.
We estimate mandatory neurons in the state-of-the-art neural network, VGG-19. By leveraging a modified Grad-CAM map (Selvaraju et al., ), we demonstrate that these mandatory neurons are essential for visual interpretability - strong evidence that they may also account for the model’s robustness performance.
In this section, we introduce basic knowledge and notations of adversarial attacks and neural network verification, with an emphasis on verification using NAP specifications. This may help readers better understand the importance of learning minimal NAP specifications.
In this paper, we focus on feed-forward ReLU neural networks. Generally speaking, a feed-forward network N????Nitalic_N is comprised of L????Litalic_L layers, where each layer performs a linear transformation followed by a ReLU activation. We denote the pre-activation value and post-activation value at the l????litalic_l-th layer as z(l)(x)superscript????????????z^{(l)}(x)italic_z start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ) and z^(l)(x)superscript^????????????\hat{z}^{(l)}(x)over^ start_ARG italic_z end_ARG start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ), respectively. The l????litalic_l-th layer computation is expressed as follows: z(l)(x)=????(l)z^(l−1)(x)+????(l)superscript????????????superscript????????superscript^????????1????superscript????????z^{(l)}(x)=\mathbf{W}^{(l)}\hat{z}^{(l-1)}(x)+\mathbf{b}^{(l)}italic_z start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ) = bold_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT over^ start_ARG italic_z end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ( italic_x ) + bold_b start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT, z^(l)(x)=????????????????(z(l)(x))superscript^????????????????????????????superscript????????????\hat{z}^{(l)}(x)=\mathbf{ReLU}(z^{(l)}(x))over^ start_ARG italic_z end_ARG start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ) = bold_ReLU ( italic_z start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ) ), with ????(l)superscript????????\mathbf{W}^{(l)}bold_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT being the weight matrix and ????(l)superscript????????\mathbf{b}^{(l)}bold_b start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT representing the bias for the l????litalic_l-th layer. We denote the number of neurons in the l????litalic_l-th layer as dlsubscript????????d_{l}italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, and the i????iitalic_i-th neuron in layer l????litalic_l as Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT. The pre-activation value and post-activation value of Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT at input x????xitalic_x is computed by zi(l)(x)superscriptsubscript????????????????z_{i}^{(l)}(x)italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ) and z^i(l)(x)superscriptsubscript^????????????????\hat{z}_{i}^{(l)}(x)over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ). The network can also be viewed as a function ????
Given a neural network ????????\mathbf{F}bold_F and a reference point x????xitalic_x, adversarial attacks aim to search for a point x′superscript????′x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT that is geometrically close to the reference point x????xitalic_x such that x′superscript????′x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and x????xitalic_x belong to different classes. Here, we use the canonical specification, that is, we want to search in the local neighborhoods (L∞subscript????L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm balls) of x????xitalic_x, formally denoted as B(x,ϵ):={x′∣‖x−x′‖∞≤ϵ}assign????????italic-ϵconditional-setsuperscript????′subscriptnorm????superscript????′italic-ϵB(x,\epsilon):=\{x^{\prime}\mid||x-x^{\prime}||_{\infty}\leq\epsilon\}italic_B ( italic_x , italic_ϵ ) := { italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ | | italic_x - italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_ϵ }, where ϵitalic-ϵ\epsilonitalic_ϵ is the radius. For certain ϵitalic-ϵ\epsilonitalic_ϵ, given we know x????xitalic_x belongs to class j????jitalic_j, we say an adversarial point is found if:
(1) ∃x′∈B(x,ϵ)∃i∈C????i(x′)−????j(x)>0formulae-sequencesuperscript????′????????italic-ϵformulae-sequence????????subscript????????superscript????′subscript????????????0\displaystyle\exists x^{\prime}\in B(x,\epsilon)\quad\exists i\in C\quad% \mathbf{F}_{i}(x^{\prime})-\mathbf{F}_{j}(x)>0∃ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_B ( italic_x , italic_ϵ ) ∃ italic_i ∈ italic_C bold_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - bold_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) > 0In practice, the change from the original data x????xitalic_x to adversarial data x′superscript????′x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT should be imperceptible, so they are more likely to be recognized as the same class/label from a human perspective. There are also metrics other than the L∞subscript????L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm to represent the ”similarity” between x????xitalic_x and x′superscript????′x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, such as the L0subscript????0L_{0}italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and L2subscript????2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norms (Xu et al., ). However, almost all of them fall into the local neighborhood specification paradigm. This is different from the NAP specification, as we will discuss later.
Neural networks are vulnerable to adversarial attacks, where even imperceptible changes can alter predictions significantly. This underscores the critical need for neural network verification, which can be viewed as countering adversarial attacks. Solving the verification problem involves formally proving the absence of adversarial points in B(x,ϵ)????????italic-ϵB(x,\epsilon)italic_B ( italic_x , italic_ϵ ). Formally, we seek to verify:
(2) ∀x′∈B(x,ϵ)∀i≠j????j(x′)−????i(x′)>0formulae-sequencefor-allsuperscript????′????????italic-ϵformulae-sequencefor-all????????subscript????????superscript????′subscript????????superscript????′0\displaystyle\forall x^{\prime}\in B(x,\epsilon)\quad\forall i\neq j\quad% \mathbf{F}_{j}(x^{\prime})-\mathbf{F}_{i}(x^{\prime})>0∀ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_B ( italic_x , italic_ϵ ) ∀ italic_i ≠ italic_j bold_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - bold_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) > 0For a simpler presentation, we assume that ????(x)????????\mathbf{F}(x)bold_F ( italic_x ) is a binary classifier. For any given specification, ????(x)≥0????????0\mathbf{F}(x)\geq 0bold_F ( italic_x ) ≥ 0 indicates that the model is verified; otherwise, we can find an adversarial example. Solving such a problem is known to be NP-hard (Katz et al., ), and achieving scalability in verification remains an ongoing challenge.
To better discuss NAP specification and robustness verification, we first introduce the relevant concepts of neuron abstractions, neuron abstraction functions, and neural activation patterns.
Given a neural network N????Nitalic_N, for an arbitrary internal neuron Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT where 0≤i≤dl0????subscript????????0\leq i\leq d_{l}0 ≤ italic_i ≤ italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and 1≤l≤L−11????????11\leq l\leq L-11 ≤ italic_l ≤ italic_L - 1, its post-activation value z^i(l)(x)superscriptsubscript^????????????????\hat{z}_{i}^{(l)}(x)over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ) can be abstracted into finite states. Formally, this can be viewed as a mapping from ℝℝ\mathbb{R}blackboard_R to ????????\mathbb{S}blackboard_S, where ????????\mathbb{S}blackboard_S represents a set of abstraction states {s1,s2,…,sn}subscript????1subscript????2…subscript????????\{s_{1},s_{2},\ldots,s_{n}\}{ italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }. We define the activation using abstraction of values of a neuron output. A straightforward and intuitive abstraction is a binary abstraction on ReLU activation function. consisting of two states s0subscript????0s_{0}italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and s1subscript????1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, where s0:=0assignsubscript????00s_{0}:=0italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := 0 and s1:=(0,∞)assignsubscript????10s_{1}:=(0,\infty)italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := ( 0 , ∞ ). s0subscript????0s_{0}italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and s1subscript????1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are often referred to as deactivation and activation states. In addition, we can further abstract these two states to a unary state s∗:=[0,∞)assignsubscript????0s_{*}:=[0,\infty)italic_s start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT := [ 0 , ∞ ), where the interval covers the entire range of post-activation. This is essentially an identity mapping of z^i(l)(x)superscriptsubscript^????????????????\hat{z}_{i}^{(l)}(x)over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ). It means that the neurons can be in either state s0subscript????0s_{0}italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT or state s1subscript????1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Formally, we introduce a partial order ⪯precedes-or-equals\preceq⪯ among the states. s0⪯s∗precedes-or-equalssubscript????0subscript????s_{0}\preceq s_{*}italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⪯ italic_s start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT and s1⪯s∗precedes-or-equalssubscript????1subscript????s_{1}\preceq s_{*}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪯ italic_s start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT indicates that s∗subscript????s_{*}italic_s start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT is an abstraction of s0subscript????0s_{0}italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and s1subscript????1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and that s0subscript????0s_{0}italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and s1subscript????1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are refinements of s∗subscript????s_{*}italic_s start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT. Clearly, s∗⪯s∗precedes-or-equalssubscript????subscript????s_{*}\preceq s_{*}italic_s start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ⪯ italic_s start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT. For convenience, we omit s????sitalic_s in representation and use ????0\mathbf{0}bold_0, ????1\mathbf{1}bold_1, and ∗\mathbf{*}∗ to refer to these abstractions respectively. In principle, our minimal specification learning applies to any activation function provided that activation can be defined within the abstraction domain.
Given a neural network N????Nitalic_N and the abstraction state set ????????\mathbb{S}blackboard_S. A neuron abstraction function is the mapping ????:N→????:????→????????\mathcal{A}:N\rightarrow\mathbb{S}caligraphic_A : italic_N → blackboard_S. Formally, for an arbitrary neuron Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT where 0≤i≤dl0????subscript????????0\leq i\leq d_{l}0 ≤ italic_i ≤ italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and 1≤l≤L−11????????11\leq l\leq L-11 ≤ italic_l ≤ italic_L - 1, the function abstracts Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT to some state sk∈????subscript????????????s_{k}\in\mathbb{S}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_S, i.e., ????(Ni,l)=sk????subscript????????????subscript????????\mathcal{A}(N_{i,l})=s_{k}caligraphic_A ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ) = italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.
The above characterization of neuron abstraction , it does not instruct us on how to perform binary abstraction in the absence of neuron values. Thus, we include the input(s) x????xitalic_x or X????Xitalic_X as a new parameter in ????????\mathcal{A}caligraphic_A, and it is omitted when the context is clear. Here, we discuss three types of ????????\mathcal{A}caligraphic_A: the unary abstraction function function ????˙˙????\dot{\mathcal{A}}over˙ start_ARG caligraphic_A end_ARG, the binary abstraction function ????¨¨????\ddot{\mathcal{A}}over¨ start_ARG caligraphic_A end_ARG, and the statistical abstraction function ????~~????\widetilde{\mathcal{A}}over~ start_ARG caligraphic_A end_ARG.
The unary abstraction function always maps all inputs to the coarsest state ∗\mathbf{*}∗. Formally:
(3) ????˙(Ni,l,x)=∗˙????subscript????????????????\displaystyle\dot{\mathcal{A}}(N_{i,l},x)=\mathbf{*}over˙ start_ARG caligraphic_A end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT , italic_x ) = ∗When an input x????xitalic_x is passed through an internal neuron Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT, we can easily determine the binary abstraction state of Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT based on its post-activation value z^i(l)(x)superscriptsubscript^????????????????\hat{z}_{i}^{(l)}(x)over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ). This motivates us to define the binary abstraction function as follows:
(4) ????¨(Ni,l,x)={????if z^i(l)(x)=0????if z^i(l)(x)>0¨????subscript????????????????cases0if superscriptsubscript^????????????????01if superscriptsubscript^????????????????0\displaystyle\ddot{\mathcal{A}}(N_{i,l},x)=\begin{cases}\mathbf{0}&\text{if }% \hat{z}_{i}^{(l)}(x)=0\\ \mathbf{1}&\text{if }\hat{z}_{i}^{(l)}(x)>0\\ \end{cases}over¨ start_ARG caligraphic_A end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT , italic_x ) = { start_ROW start_CELL bold_0 end_CELL start_CELL if over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ) = 0 end_CELL end_ROW start_ROW start_CELL bold_1 end_CELL start_CELL if over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ) > 0 end_CELL end_ROWThe binary ????¨¨????\ddot{\mathcal{A}}over¨ start_ARG caligraphic_A end_ARG function is limited in calculating a single input, and meets a challenge when multiple inputs come into play, as two distinct inputs may lead to disagreements in a neuron’s abstraction state. We then approach this problem statistically by introducing the ????~~????\widetilde{\mathcal{A}}over~ start_ARG caligraphic_A end_ARG function defined as follows:
(5) ????~(Ni,l,X)={????if |{xj|????¨(Ni,l,xj)=????,xj∈X}||X|≥δ????if |{xj|????¨(Ni,l,xj)=????,xj∈X}||X|≥δ∗otherwise~????subscript????????????????cases0if conditional-setsubscript????????formulae-sequence¨????subscript????????????subscript????????0subscript????????????????????1if conditional-setsubscript????????formulae-sequence¨????subscript????????????subscript????????1subscript????????????????????otherwise\displaystyle\widetilde{\mathcal{A}}(N_{i,l},X)=\begin{cases}\mathbf{0}&\text{% if }\frac{|\{x_{j}|\ddot{\mathcal{A}}(N_{i,l},x_{j})=\mathbf{0},\text{ }x_{j}% \in X\}|}{|X|}\geq\delta\\ \mathbf{1}&\text{if }\frac{|\{x_{j}|\ddot{\mathcal{A}}(N_{i,l},x_{j})=\mathbf{% 1},\text{ }x_{j}\in X\}|}{|X|}\geq\delta\\ \mathbf{*}&\text{otherwise}\\ \end{cases}over~ start_ARG caligraphic_A end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT , italic_X ) = { start_ROW start_CELL bold_0 end_CELL start_CELL if divide start_ARG | { italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | over¨ start_ARG caligraphic_A end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = bold_0 , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_X } | end_ARG start_ARG | italic_X | end_ARG ≥ italic_δ end_CELL end_ROW start_ROW start_CELL bold_1 end_CELL start_CELL if divide start_ARG | { italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | over¨ start_ARG caligraphic_A end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = bold_1 , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_X } | end_ARG start_ARG | italic_X | end_ARG ≥ italic_δ end_CELL end_ROW start_ROW start_CELL ∗ end_CELL start_CELL otherwise end_CELL end_ROWwhere δ????\deltaitalic_δ is a real number from [0,1]01[0,1][ 0 , 1 ], and X????Xitalic_X represents a set of inputs, i.e., X:={x1,x2,…,xn}assign????subscript????1subscript????2…subscript????????X:=\{x_{1},x_{2},\ldots,x_{n}\}italic_X := { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }. Since datasets often contain noisy data or challenging instances that the model cannot predict accurately, we introduce the parameter δ????\deltaitalic_δ to accommodate standard classification settings in which Type I and Type II errors are non-negligible. Intuitively, the introduction of δ????\deltaitalic_δ allows multiple inputs x1,…,xnsubscript????1…subscript????????x_{1},...,x_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to vote on a neuron’s state. For instance, when δ????\deltaitalic_δ is set to 0.99, then 99% or more of the inputs must agree that a neuron is activated for the neuron to be in the ????1\mathbf{1}bold_1 state. It is worth mentioning that the statistical ????~~????\widetilde{\mathcal{A}}over~ start_ARG caligraphic_A end_ARG function is equivalent to the method described in Geng et al. (), which will serve as our baseline approach.
Given a neural network N????Nitalic_N and a neuron abstraction function ????????\mathcal{A}caligraphic_A. A neural activation pattern (NAP) P????Pitalic_P is a tuple that consists of the abstraction state of all neurons in N????Nitalic_N. Formally, P:=⟨????(Ni,l)|Ni,l∈N,????∈{????¨,????˙}⟩assign????inner-product????subscript????????????formulae-sequencesubscript????????????????????¨????˙????P:=\langle\mathcal{A}(N_{i,l})\text{ }|\text{ }N_{i,l}\in N,\mathcal{A}\in\{% \ddot{\mathcal{A}},\dot{\mathcal{A}}\}\rangleitalic_P := ⟨ caligraphic_A ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ) | italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ∈ italic_N , caligraphic_A ∈ { over¨ start_ARG caligraphic_A end_ARG , over˙ start_ARG caligraphic_A end_ARG } ⟩, also denoted as ????(N)????????\mathcal{A}(N)caligraphic_A ( italic_N ). The neuron Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT’s abstraction state specified by NAP P????Pitalic_P is represented as Pi,lsubscript????????????P_{i,l}italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT, i.e., Pi,l=????(Ni,l)subscript????????????????subscript????????????P_{i,l}=\mathcal{A}(N_{i,l})italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT = caligraphic_A ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ).
We denote the power set of NAPs in N????Nitalic_N as ????????\mathcal{P}caligraphic_P. The number of all possible NAPs in N????Nitalic_N scales exponentially as the number of neurons increases. For such a large set, if we aim to find the minimal NAP — the central problem in this work, we first have to establish an order so that NAPs can be compared. To this end, we define the following partial order:
For any given two NAPs P,P′∈????????superscript????′????P,P^{\prime}\in\mathcal{P}italic_P , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P, we say P′superscript????′P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT subsumes P????Pitalic_P if, for each neuron Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT, its state in P????Pitalic_P is an abstraction of that in P′superscript????′P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Formally, this can be defined as:
(6) P′≼P⇔Pi,l′⪯Pi,l∀Ni,l∈Niffprecedes-or-equalssuperscript????′????precedes-or-equalssubscriptsuperscript????′????????subscript????????????for-allsubscript????????????????\displaystyle P^{\prime}\preccurlyeq P\text{ }\iff\text{ }P^{\prime}_{i,l}% \preceq P_{i,l}\text{ }\text{ }\forall N_{i,l}\in Nitalic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≼ italic_P ⇔ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ⪯ italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ∀ italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ∈ italic_NMoreover, two NAPs P,P′????superscript????′P,P^{\prime}italic_P , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are equivalent if P≼P′precedes-or-equals????superscript????′P\preccurlyeq P^{\prime}italic_P ≼ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and P′≼Pprecedes-or-equalssuperscript????′????P^{\prime}\preccurlyeq Pitalic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≼ italic_P.
To give a concrete example, Figure 2(b) depicts a subset of NAPs of a simple neural network consisting of 2 hidden layers and 4 neurons, as presented in Figure 2(a).
It is interesting to see that a subset of the NAP ????????\mathcal{P}caligraphic_P family can form a complete binary tree given an order of abstraction in neurons. In this example, we use the order of N0,1,N1,1,N0,2,N1,2subscript????01subscript????11subscript????02subscript????12N_{0,1},N_{1,1},N_{0,2},N_{1,2}italic_N start_POSTSUBSCRIPT 0 , 1 end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT 0 , 2 end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT. Setting a different order will create a different tree of NAPs. The root of the tree is the coarsest NAP ⟨∗,∗,∗,∗⟩\langle\mathbf{*},\mathbf{*},\mathbf{*},\mathbf{*}\rangle⟨ ∗ , ∗ , ∗ , ∗ ⟩. Increasing the depth means that ????¨¨????\ddot{\mathcal{A}}over¨ start_ARG caligraphic_A end_ARG applies to more neurons, and when reaching the leaf node, all neurons have been abstracted by ????¨¨????\ddot{\mathcal{A}}over¨ start_ARG caligraphic_A end_ARG. The leaf nodes represent the most refined NAPs, and there are 2|N|superscript2????2^{|N|}2 start_POSTSUPERSCRIPT | italic_N | end_POSTSUPERSCRIPT of them in total. In addition, each child node always subsumes its parent. By transitivity, a leaf node always subsumes its ancestors along the path. For instance, ⟨????,????,????,????⟩≼⟨????,????,????,∗⟩≼⟨????,????,∗,∗⟩≼⟨∗,∗,∗,∗⟩precedes-or-equalsprecedes-or-equals10precedes-or-equals\langle\mathbf{1},\mathbf{0},\mathbf{1},\mathbf{0}\rangle\preccurlyeq\langle%
\mathbf{1},\mathbf{0},\mathbf{1},\mathbf{*}\rangle\preccurlyeq\langle\mathbf{1%
},\mathbf{0},\mathbf{*},\mathbf{*}\rangle\preccurlyeq\langle\mathbf{*},\mathbf%
{*},\mathbf{*},\mathbf{*}\rangle⟨ bold_1 , bold_0 , bold_1 , bold_0 ⟩ ≼ ⟨ bold_1 , bold_0 , bold_1 , ∗ ⟩ ≼ ⟨ bold_1 , bold_0 , ∗ , ∗ ⟩ ≼ ⟨ ∗ , ∗ , ∗ , ∗ ⟩. However, the children under the same parent are not comparable. For instance, ⟨????,????,∗,∗⟩≼
\scaleobj
0.55- 0.55-
One key requirement for verification specification is its ability to represent a specific region within the input space. For the canonical local neighbor specifications, these specifications define L∞subscript????L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm balls using explicit formulas like B(x,ϵ):={x′∣‖x−x′‖∞≤ϵ}assign????????italic-ϵconditional-setsuperscript????′subscriptnorm????superscript????′italic-ϵB(x,\epsilon):=\{x^{\prime}\mid||x-x^{\prime}||_{\infty}\leq\epsilon\}italic_B ( italic_x , italic_ϵ ) := { italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ | | italic_x - italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_ϵ }. On the other hand, NAP specifications outline certain regions in the input space implicitly. We define the region specified by P????Pitalic_P as RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT which is a set of inputs whose activation pattern subsumes the given NAP P????Pitalic_P. Formally, we have RP:={x∣????(N,x)≼P}assignsubscript????????conditional-set????precedes-or-equals????????????????R_{P}:=\{x\mid{\mathcal{A}(N,x)\preccurlyeq P}\}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT := { italic_x ∣ caligraphic_A ( italic_N , italic_x ) ≼ italic_P }. To provide a concrete example, Figure 3 illustrates the NAP family of the simple neural network in Figure 2(a). These NAPs essentially correspond to regions bounded by hyperplanes created by neurons. The most refined NAPs correspond to individual linear regions from (1) to (11). For example. linear region (9) represents ⟨????,????,????,????⟩\langle\mathbf{0},\mathbf{1},\mathbf{1},\mathbf{1}\rangle⟨ bold_0 , bold_1 , bold_1 , bold_1 ⟩. Since the coarsest state ∗\mathbf{*}∗ abstracts the binary states ????0\mathbf{0}bold_0 and ????1\mathbf{1}bold_1, a NAP with more ∗\mathbf{*}∗ covers a larger region in the input space, and this region can be concave. Take the NAP ⟨∗,∗,????,∗⟩1\langle\mathbf{*},\mathbf{*},\mathbf{1},\mathbf{*}\rangle⟨ ∗ , ∗ , bold_1 , ∗ ⟩ as an example; it corresponds to the union of linear regions (1), (5), (7), (9), and (10). It is interesting to note that the number of linear regions is less than the size of the NAP family. Similar findings have been reported in (Geng et al., ; Hanin and Rolnick, a, b).
This subsection illustrates how NAP specifications can be utilized for robustness verification. We discuss how the abstraction states of neurons can serve as signatures of distinct classes with the introduction of the class NAP definition.
In a classification task with the class set C????Citalic_C, for any class c∈C????????c\in Citalic_c ∈ italic_C, a class NAP Pcsuperscript????????P^{c}italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT is a NAP comprising abstract states outputted by an abstraction function given Xcsubscript????????X_{c}italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, where Xcsubscript????????X_{c}italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT denotes the set of inputs belonging to class c????citalic_c. Formally, Pc:=⟨????(Ni,l,Xc)∣Ni,l∈N,????∈{????~,????˙}⟩assignsuperscript????????inner-product????subscript????????????subscript????????formulae-sequencesubscript????????????????????~????˙????P^{c}:=\langle\mathcal{A}(N_{i,l},X_{c})\mid N_{i,l}\in N,\mathcal{A}\in\{% \widetilde{\mathcal{A}},\dot{\mathcal{A}}\}\rangleitalic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT := ⟨ caligraphic_A ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ∣ italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ∈ italic_N , caligraphic_A ∈ { over~ start_ARG caligraphic_A end_ARG , over˙ start_ARG caligraphic_A end_ARG } ⟩. The power set of Pcsuperscript????????P^{c}italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT is denoted as ????csuperscript????????\mathcal{P}^{c}caligraphic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT.
Recall that robustness verification can be thought of as proving that no adversarial examples exist in the local neighbourhoods of some reference point x????xitalic_x. Intuitively, for NAP specifications, it is equivalent to show that no adversarial examples exist in the class NAP Pcsuperscript????????P^{c}italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT; in other words, inputs exhibiting the NAP Pcsuperscript????????P^{c}italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT must be predicted as class c????citalic_c. Geng et al. () argues that class NAPs must satisfy several essential requirements to qualify as NAP specifications. And finding qualified NAP specifications implies the underlying robustness problem is verified. We formally frame these requirements into the following properties:
Since we want class NAPs to serve as certificates for a certain class, they must be distinct from each other. Otherwise, there could exist an input that exhibits two class NAPs, which leads to conflicting predictions. Formally, we aim to verify the following:
(9) ∀x∀c1,c2∈C s.t. c1≠c2????(N,x)≼Pc1⟹????(N,x)≼ \scaleobj0.55-
Pc2formulae-sequencefor-all????for-allsubscript????1subscript????2???? s.t. subscript????1subscript????2precedes-or-equals????????????superscript????subscript????1⟹????????????precedes-or-equals \scaleobj0.55-
superscript????subscript????2\displaystyle\forall x\quad\forall c_{1},c_{2}\in C\text{ s.t. }c_{1}\neq c_{2% }\quad\mathcal{A}(N,x)\preccurlyeq P^{c_{1}}\Longrightarrow\mathcal{A}(N,x)% \mathrel{\ooalign{$\preccurlyeq$\cr\kern 1.2pt$\mathrel{\raisebox{-0.95pt}{$% \rotatebox[origin={c}]{-315.0}{\scaleobj{0.55}{-}}$}}$}}P^{c_{2}}∀ italic_x ∀ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_C s.t. italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_A ( italic_N , italic_x ) ≼ italic_P start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟹ caligraphic_A ( italic_N , italic_x ) start_RELOP start_ROW start_CELL ≼ end_CELL end_ROW start_ROW start_CELL \scaleobj0.55- end_CELL end_ROW end_RELOP italic_P start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPTFrom a geological perspective, there must be no overlaps between class NAPs. In other words, it is also equivalent to verifying:
∀c1,c2∈C s.t. c1≠c2RPc1⋂RPc2=∅formulae-sequencefor-allsubscript????1subscript????2???? s.t. subscript????1subscript????2subscript????superscript????subscript????1subscript????superscript????subscript????2\forall c_{1},c_{2}\in C\text{ s.t. }c_{1}\neq c_{2}\quad R_{P^{c_{1}}}\bigcap R% _{P^{c_{2}}}=\emptyset∀ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_C s.t. italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⋂ italic_R start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ∅We empirically observe that successful models, particularly neural networks that have achieved high accuracy in classification tasks, naturally exhibit this property. Because successful models tend to avoid confusion in predictions, their class NAPs are usually very distinct from each other.
To serve as NAP specifications, class NAPs Pcsuperscript????????P^{c}italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ensure that if an input exhibits it, i.e., ????(N,x)≼P,????∈{????¨,????˙}formulae-sequenceprecedes-or-equals????????????????????¨????˙????\mathcal{A}(N,x)\preccurlyeq P,\mathcal{A}\in\{\ddot{\mathcal{A}},\dot{% \mathcal{A}}\}caligraphic_A ( italic_N , italic_x ) ≼ italic_P , caligraphic_A ∈ { over¨ start_ARG caligraphic_A end_ARG , over˙ start_ARG caligraphic_A end_ARG }, the input must be predicted as the corresponding class c????citalic_c. Formally, we have:
(10) ∀x∈RPc∀k∈C s.t. k≠c????c(x)−????k(x)>0formulae-sequenceformulae-sequencefor-all????subscript????superscript????????for-all???????? s.t. ????????subscript????????????subscript????????????0\displaystyle\forall x\in R_{P^{c}}\quad\forall k\in C\text{ s.t. }k\neq c% \quad\mathbf{F}_{c}(x)-\mathbf{F}_{k}(x)>0∀ italic_x ∈ italic_R start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∀ italic_k ∈ italic_C s.t. italic_k ≠ italic_c bold_F start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) - bold_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) > 0in which
(11) RPc={x∣????(N,x)≼Pc}subscript????superscript????????conditional-set????precedes-or-equals????????????superscript????????\displaystyle R_{P^{c}}=\{x\mid{\mathcal{A}(N,x)\preccurlyeq P^{c}}\}italic_R start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { italic_x ∣ caligraphic_A ( italic_N , italic_x ) ≼ italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT }In contrast to canonical L∞subscript????L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm balls, class NAPs are more flexible in terms of size and shape. Additionally, there is no need to specify a reference point, since the locations of potential reference points are also encoded by class NAPs. However, it is possible that no class NAP Pcsuperscript????????P^{c}italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT in ????csuperscript????????\mathcal{P}^{c}caligraphic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT satisfies this property. This can be mitigated by meeting the subsequent weaker property.
Instead of relying solely on class NAPs as specifications, local neighbours can still be employed in conjunction for verification. This hybrid form of specification has several advantages: 1) It narrows down the scope of verifiable regions when no class NAPs can meet the NAP robustness property; 2) NAP constraints essentially fix ReLU states, refining the search space for verification tools; 3) It focuses on the verification of valid test inputs rather than adversarial examples. Formally, this property can be stated as:
(12) ∀x′∈B(x,ϵ)⋂RPc∀k∈C s.t. k≠c????c(x)−????k(x)>0formulae-sequenceformulae-sequencefor-allsuperscript????′????????italic-ϵsubscript????superscript????????for-all???????? s.t. ????????subscript????????????subscript????????????0\displaystyle\forall x^{\prime}\in B(x,\epsilon)\bigcap R_{P^{c}}\quad\forall k% \in C\text{ s.t. }k\neq c\quad\mathbf{F}_{c}(x)-\mathbf{F}_{k}(x)>0∀ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_B ( italic_x , italic_ϵ ) ⋂ italic_R start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∀ italic_k ∈ italic_C s.t. italic_k ≠ italic_c bold_F start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) - bold_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) > 0in which
(13) B(x,ϵ)={x′∣‖x−x′‖∞≤ϵ}RPc={x′∣????(N,x′)≼Pc}formulae-sequence????????italic-ϵconditional-setsuperscript????′subscriptnorm????superscript????′italic-ϵsubscript????superscript????????conditional-setsuperscript????′precedes-or-equals????????superscript????′superscript????????\displaystyle B(x,\epsilon)=\{x^{\prime}\mid||x-x^{\prime}||_{\infty}\leq% \epsilon\}\quad R_{P^{c}}=\{x^{\prime}\mid{\mathcal{A}(N,x^{\prime})% \preccurlyeq P^{c}}\}italic_B ( italic_x , italic_ϵ ) = { italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ | | italic_x - italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_ϵ } italic_R start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ caligraphic_A ( italic_N , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≼ italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT }To summarize, we state that a class NAP can serve as a NAP specification if it satisfies either the NAP robustness property or the NAP-augmented robustness property. Clearly, the former property is stronger, and it is possible that we can’t find a class NAP Pcsuperscript????????P^{c}italic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT in ????csuperscript????????\mathcal{P}^{c}caligraphic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT to satisfy this property. Fortunately, we can always find NAPs that satisfy the latter property by narrowing the verifiable region using additional L∞subscript????L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm ball specifications.
In this section, we formulate the problem of learning minimal NAP specifications and present two naive approaches for solving this problem. Since our approaches require interactions with verification tools, we begin by introducing relevant notations describing the relationships between NAPs and verification tools.
Let (????c,≼)superscript????????precedes-or-equals(\mathcal{P}^{c},\preccurlyeq)( caligraphic_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , ≼ ) be a partially ordered set corresponding to a family of class NAPs regarding some class c∈C????????c\in Citalic_c ∈ italic_C. For simplicity, we omit the superscript c????citalic_c and refer to class NAPs simply as NAPs when the context is clear. We assume we have access to a verification tool, ????:????→{0,1}:????→????01\mathcal{V}:\mathcal{P}\to\{0,1\}caligraphic_V : caligraphic_P → { 0 , 1 }, which maps a class NAP P∈????????????P\in\mathcal{P}italic_P ∈ caligraphic_P to a binary set. Here, ????(P)=1????????1\mathcal{V}(P)=1caligraphic_V ( italic_P ) = 1 denotes a successful verification of the underlying robustness query, while 0 indicates the presence of an adversarial example. From an alternative perspective, ????(P)=1????????1\mathcal{V}(P)=1caligraphic_V ( italic_P ) = 1 also signifies that P????Pitalic_P is a NAP specification, i.e., it satisfies NAP(-augmented) robustness properties; whereas ????(P)=0????????0\mathcal{V}(P)=0caligraphic_V ( italic_P ) = 0 implies the opposite.
It is not hard to see that ????????\mathcal{V}caligraphic_V is monotone with respect to the NAP family (????,≼)????precedes-or-equals(\mathcal{P},\preccurlyeq)( caligraphic_P , ≼ ). Given P≼P′precedes-or-equals????superscript????′P\preccurlyeq P^{\prime}italic_P ≼ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and ????(P′)=1????superscript????′1\mathcal{V}(P^{\prime})=1caligraphic_V ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = 1, it follows that ????(P)=1????????1\mathcal{V}(P)=1caligraphic_V ( italic_P ) = 1. However, given P≼P′precedes-or-equals????superscript????′P\preccurlyeq P^{\prime}italic_P ≼ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and ????(P)=1????????1\mathcal{V}(P)=1caligraphic_V ( italic_P ) = 1, we cannot determine ????(P′)????superscript????′\mathcal{V}(P^{\prime})caligraphic_V ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). In other words, refining a NAP (by increasing the number of neurons abstracted to ????0\mathbf{0}bold_0 or ????1\mathbf{1}bold_1) can only enhance the likelihood of successful verification of the underlying robustness query.
Given a family of NAPs ????????\mathcal{P}caligraphic_P and a verification tool ????????\mathcal{V}caligraphic_V, the minimal NAP specification problem is to find a NAP P????Pitalic_P such that
argminP∈????,????(P)=1|P|formulae-sequence????????????????1????\underset{P\in\mathcal{P},\mathcal{V}(P)=1}{\arg\min}|P|start_UNDERACCENT italic_P ∈ caligraphic_P , caligraphic_V ( italic_P ) = 1 end_UNDERACCENT start_ARG roman_arg roman_min end_ARG | italic_P |where |P|????|P|| italic_P | is defined as the size of P????Pitalic_P, representing the number of neurons that are abstracted to the ????0\mathbf{0}bold_0 or ????1\mathbf{1}bold_1, i.e., |{Ni,l∣Pi,l=???? or ????}|conditional-setsubscript????????????subscript????????????0 or 1|\{N_{i,l}\mid P_{i,l}=\mathbf{0}\text{ or }\mathbf{1}\}|| { italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ∣ italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT = bold_0 or bold_1 } |. The (largest) minimal NAP specification’s size is denoted as s????sitalic_s.
When P????Pitalic_P is minimal, it implies that for any NAP P′superscript????′P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT that is strictly more coarse than P????Pitalic_P, ????(P′)=0????superscript????′0\mathcal{V}(P^{\prime})=0caligraphic_V ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = 0. Formally, this can be expressed as: ∀P≼P′,P′≠P:????(P′)=0:formulae-sequenceprecedes-or-equalsfor-all????superscript????′superscript????′????????superscript????′0\forall P\preccurlyeq P^{\prime},P^{\prime}\neq P:\mathcal{V}(P^{\prime})=0∀ italic_P ≼ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_P : caligraphic_V ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = 0. Thus, there could exist multiple minimal NAP specifications. In such cases, we only need to choose one of them. On the other hand, it is possible that even the most refined NAPs cannot verify the robustness query. In such cases, we claim that no minimal NAP specifications exist. Additionally, since the computation using verification tools is usually expensive, we are interested in methods that efficiently find a minimal NAP specification, i.e., minimizing the number of calls to ????????\mathcal{V}caligraphic_V.
We present two naive approaches to solving the problem. Since these approaches involve refining and coarsening the NAP, we will first formally define these two actions.
Recall our definition of NAP, the coarsest NAP is the one using ????˙˙????\dot{\mathcal{A}}over˙ start_ARG caligraphic_A end_ARG to abstract each neuron in N????Nitalic_N. We denote this NAP as P˙:=⟨????˙(Ni,l)|Ni,l∈N⟩assign˙????inner-product˙????subscript????????????subscript????????????????\dot{P}:=\langle\dot{\mathcal{A}}(N_{i,l})\text{ }|\text{ }N_{i,l}\in N\rangleover˙ start_ARG italic_P end_ARG := ⟨ over˙ start_ARG caligraphic_A end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ) | italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ∈ italic_N ⟩. P˙˙????\dot{P}over˙ start_ARG italic_P end_ARG is the smallest NAP with |P˙|=0˙????0|\dot{P}|=0| over˙ start_ARG italic_P end_ARG | = 0. In addition, we define the most refined NAP as the one that applies ????~~????\widetilde{\mathcal{A}}over~ start_ARG caligraphic_A end_ARG to abstract each neuron in N????Nitalic_N. We denote it as P~:=⟨????~(Ni,l)|Ni,l∈N⟩assign~????inner-product~????subscript????????????subscript????????????????\widetilde{P}:=\langle\widetilde{\mathcal{A}}(N_{i,l})\text{ }|\text{ }N_{i,l}% \in N\rangleover~ start_ARG italic_P end_ARG := ⟨ over~ start_ARG caligraphic_A end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ) | italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ∈ italic_N ⟩. P~~????\widetilde{P}over~ start_ARG italic_P end_ARG is the largest NAP with size with |P~|≤|N|~????????|\widetilde{P}|\leq|N|| over~ start_ARG italic_P end_ARG | ≤ | italic_N |. Clearly, P~≼P˙precedes-or-equals~????˙????\widetilde{P}\preccurlyeq\dot{P}over~ start_ARG italic_P end_ARG ≼ over˙ start_ARG italic_P end_ARG. Given any NAP P????Pitalic_P, if we want to refine P????Pitalic_P through a specific neuron Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT, we apply the ????~~????\widetilde{\mathcal{A}}over~ start_ARG caligraphic_A end_ARG function to Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT. We denote this refinement action as Δ~(Ni,l)~Δsubscript????????????\widetilde{\Delta}(N_{i,l})over~ start_ARG roman_Δ end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ). This action will either increase or leave unchanged |P|????|P|| italic_P |. Similarly, we denote the coarsen action as Δ˙(Ni,l)˙Δsubscript????????????\dot{\Delta}(N_{i,l})over˙ start_ARG roman_Δ end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ). This action will either decrease or leave unchanged |P|????|P|| italic_P |. We present the semantics of these actions as follows:
{prooftree}\AxiomC\UnaryInfCP⊢⋅⇓PP\vdash\cdot\Downarrow Pitalic_P ⊢ ⋅ ⇓ italic_P
{prooftree}\AxiomCP⊢Δ1⇓P′proves????⇓subscriptΔ1superscript????′P\vdash\Delta_{1}\Downarrow P^{\prime}italic_P ⊢ roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⇓ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT \AxiomCP′⊢Δ2⇓P′′provessuperscript????′⇓subscriptΔ2superscript????′′P^{\prime}\vdash\Delta_{2}\Downarrow P^{\prime\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊢ roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⇓ italic_P start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT \BinaryInfC P⊢Δ1;Δ2⇓P′′proves????⇓subscriptΔ1subscriptΔ2superscript????′′P\vdash\Delta_{1};\Delta_{2}\Downarrow P^{\prime\prime}italic_P ⊢ roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⇓ italic_P start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT
{prooftree}\AxiomCPi,l=∗subscript????????????P_{i,l}=\mathbf{*}italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT = ∗ \LeftLabelRefine \UnaryInfCP⊢Δ~(Ni,l)⇓????~(Ni,l)∪P∖Pi,lproves????⇓~Δsubscript????????????~????subscript????????????????subscript????????????P\vdash\widetilde{\Delta}(N_{i,l})\Downarrow\widetilde{\mathcal{A}}(N_{i,l})% \cup P\setminus P_{i,l}italic_P ⊢ over~ start_ARG roman_Δ end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ) ⇓ over~ start_ARG caligraphic_A end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ) ∪ italic_P ∖ italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT \DisplayProof \AxiomCPi,l∈{????,????}subscript????????????01P_{i,l}\in\{\mathbf{0},\mathbf{1}\}italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ∈ { bold_0 , bold_1 } \LeftLabelCoarsen \UnaryInfCP⊢Δ˙(Ni,l)⇓????˙(Ni,l)∪P∖Pi,lproves????⇓˙Δsubscript????????????˙????subscript????????????????subscript????????????P\vdash\dot{\Delta}(N_{i,l})\Downarrow\dot{\mathcal{A}}(N_{i,l})\cup P% \setminus P_{i,l}italic_P ⊢ over˙ start_ARG roman_Δ end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ) ⇓ over˙ start_ARG caligraphic_A end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ) ∪ italic_P ∖ italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT
where P,P′,P′′????superscript????′superscript????′′P,P^{\prime},P^{\prime\prime}italic_P , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT are from the NAP family ????????\mathcal{P}caligraphic_P, and N????Nitalic_N is the underlying neural network. Δ1subscriptΔ1\Delta_{1}roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Δ2subscriptΔ2\Delta_{2}roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are any two sequences of refine and coarsen actions.
Conceptually, the Refine approach iteratively increases the number of refined neurons in NAP P????Pitalic_P until ????(P)=1????????1\mathcal{V}(P)=1caligraphic_V ( italic_P ) = 1, i.e., P????Pitalic_P is able to prove the underlying robustness query. In other words, we gradually increase the size parameter k????kitalic_k and iterate over each NAP P????Pitalic_P of size k????kitalic_k to check if ????(P)=1????????1\mathcal{V}(P)=1caligraphic_V ( italic_P ) = 1, as illustrated in Algorithm 1. To determine if a solution to the problem exists, we first check if the most refined NAP can succeed in verification. We proceed to iterative refinement only if ????(P~)=1????~????1\mathcal{V}(\widetilde{P})=1caligraphic_V ( over~ start_ARG italic_P end_ARG ) = 1. However, the algorithm is not efficient and requires 2|N|−1superscript2????12^{|N|}-12 start_POSTSUPERSCRIPT | italic_N | end_POSTSUPERSCRIPT - 1 calls to ????????\mathcal{V}caligraphic_V in the worst case, as proven in Theorem 3.2. Please refer to the proof in Appendix A. Therefore Refine is only practical when the search space of the NAP family ????????\mathcal{P}caligraphic_P is small.
The algorithm Refine returns a minimal NAP specification with ????(2|N|)????superscript2????\mathcal{O}(2^{|N|})caligraphic_O ( 2 start_POSTSUPERSCRIPT | italic_N | end_POSTSUPERSCRIPT ) calls to ????????\mathcal{V}caligraphic_V.
In contrast to the Refine approach, the Coarsen approach starts from the most refined NAP and then gradually coarsens it. We first check if the problem is well-defined by verifying if the most refined NAP P~~????\widetilde{P}over~ start_ARG italic_P end_ARG succeeds in verification. Then, for each neuron, we attempt to coarsen it using ????˙˙????\dot{\mathcal{A}}over˙ start_ARG caligraphic_A end_ARG; if the resulting NAP no longer verifies the query, we refine it back using ????~~????\widetilde{\mathcal{A}}over~ start_ARG caligraphic_A end_ARG; otherwise, we keep the coarsened NAP. We describe the above procedure in Algorithm 2. The algorithm could require |N|????|N|| italic_N | calls to ????????\mathcal{V}caligraphic_V in the worst case, as proven in Theorem 3.3. Please refer to the proof in Appendix A.
The algorithm Coarsen returns a minimal NAP specification with ????(|N|)????????\mathcal{O}(|N|)caligraphic_O ( | italic_N | ) calls to ????????\mathcal{V}caligraphic_V.
While the Refine and Coarsen algorithms can find minimal NAP specifications with correctness guarantees, their inefficiency poses challenges with the verification of large neural networks. To address this issue, we introduce two efficient approaches for estimating minimal NAP specifications without requiring expensive calls to the verification tool. Unlike verification-based approaches, these estimation methods are deeply linked to mandatory neurons, a key concept in dissecting the minimal NAP specification problem, as discussed below.
A neuron Ni,l∈Nsubscript????????????????N_{i,l}\in Nitalic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ∈ italic_N is considered mandatory if it cannot be coarsened to ∗\mathbf{*}∗ in any minimal NAP specification. We denote the set of all mandatory neurons as M????Mitalic_M, defined by:
M={Ni,l∣Pi,l∈{????,????},P is minimal}????conditional-setsubscript????????????subscript????????????01???? is minimalM=\{N_{i,l}\mid P_{i,l}\in\{\mathbf{0},\mathbf{1}\},P\textit{ is minimal}\}italic_M = { italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ∣ italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ∈ { bold_0 , bold_1 } , italic_P is minimal }Note that M????Mitalic_M is the union of the set of mandatory neurons from all minimal NAP specifications. It follows that |M|≥s????????|M|\geq s| italic_M | ≥ italic_s, where s????sitalic_s denotes the size of the largest minimal NAP specification.
The minimal NAP specification problem can be solved trivially if we gain access to M????Mitalic_M. Thus, our verification-free approaches are designed to determine mandatory neurons and estimate M????Mitalic_M. To better understand these our approaches, we first discuss the properties of mandatory neurons. Recall that verifying a robustness query given a NAP specification P????Pitalic_P is equivalent to showing that ????(x)≥0????????0\mathbf{F}(x)\geq 0bold_F ( italic_x ) ≥ 0 for input x????xitalic_x in region RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT. Thus, the necessary conditions of a mandatory neuron Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT can be written as follows:
If Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT is in state ????0\mathbf{0}bold_0, it implies when z^i(l)(x)=0,????(x)≥0formulae-sequencesuperscriptsubscript^????????????????0????????0\hat{z}_{i}^{(l)}(x)=0,\mathbf{F}(x)\geq 0over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ) = 0 , bold_F ( italic_x ) ≥ 0. In addition, ∃x s.t. z^i(l)(x)>0,????(x)<0formulae-sequence???? s.t. superscriptsubscript^????????????????0????????0\exists x\text{ s.t. }\hat{z}_{i}^{(l)}(x)>0,\mathbf{F}(x)<0∃ italic_x s.t. over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ) > 0 , bold_F ( italic_x ) < 0.
If Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT is in state ????1\mathbf{1}bold_1, it implies ∀x s.t. z^i(l)(x)>0,????(x)≥0formulae-sequencefor-all???? s.t. superscriptsubscript^????????????????0????????0\forall x\text{ s.t. }\hat{z}_{i}^{(l)}(x)>0,\mathbf{F}(x)\geq 0∀ italic_x s.t. over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ) > 0 , bold_F ( italic_x ) ≥ 0. In addition, when z^i(l)(x)=0,????(x)<0formulae-sequencesuperscriptsubscript^????????????????0????????0\hat{z}_{i}^{(l)}(x)=0,\mathbf{F}(x)<0over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ) = 0 , bold_F ( italic_x ) < 0.
As for non-mandatory neurons in P????Pitalic_P, since they can be coarsened to ∗\mathbf{*}∗, it implies ????(x)≥0????????0\mathbf{F}(x)\geq 0bold_F ( italic_x ) ≥ 0 regardless of the value of z^i(l)(x)superscriptsubscript^????????????????\hat{z}_{i}^{(l)}(x)over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_x ). Formally, this can be written as: If Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT is in state ????0\mathbf{0}bold_0 or ????1\mathbf{1}bold_1, it implies that ∀x,????(x)≥0for-all????????????0\forall x,\mathbf{F}(x)\geq 0∀ italic_x , bold_F ( italic_x ) ≥ 0. In our current approaches, including the Refine and Coarsen algorithms, we rely on interaction with the verification tool ????????\mathcal{V}caligraphic_V to identify mandatory neurons. Since calls to ????????\mathcal{V}caligraphic_V are typically computationally expensive, it would be advantageous to estimate M????Mitalic_M in a more cost-effective manner. This motivates us to study the following two verification-free approaches.
We first introduce Adversarial_Prune to identify mandatory neurons. Intuitively, it attempts to show a neuron Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT is mandatory by providing an adversarial example x′superscript????′x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT exists.
When an adversarial example x′superscript????′x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is found, it immediately indicates that the NAP ????¨(N,x′)¨????????superscript????′\ddot{\mathcal{A}}(N,x^{\prime})over¨ start_ARG caligraphic_A end_ARG ( italic_N , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) fails the verification, i.e., ????(????¨(N,x′))=0????¨????????superscript????′0\mathcal{V}(\ddot{\mathcal{A}}(N,x^{\prime}))=0caligraphic_V ( over¨ start_ARG caligraphic_A end_ARG ( italic_N , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) = 0. Moreover, it also implies that any NAP subsumed by ????¨(N,x′)¨????????superscript????′\ddot{\mathcal{A}}(N,x^{\prime})over¨ start_ARG caligraphic_A end_ARG ( italic_N , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) fails verification. For instance, suppose an adversarial example x′superscript????′x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is found for a simple one-layer four-neuron neural network and ????¨(N,x′)¨????????superscript????′\ddot{\mathcal{A}}(N,x^{\prime})over¨ start_ARG caligraphic_A end_ARG ( italic_N , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) is ⟨????,????,????,????⟩\langle\mathbf{1},\mathbf{0},\mathbf{1},\mathbf{0}\rangle⟨ bold_1 , bold_0 , bold_1 , bold_0 ⟩. We can infer that NAPs like ⟨????,????,????,∗⟩101\langle\mathbf{1},\mathbf{0},\mathbf{1},\mathbf{*}\rangle⟨ bold_1 , bold_0 , bold_1 , ∗ ⟩, ⟨????,????,∗,∗⟩10\langle\mathbf{1},\mathbf{0},\mathbf{*},\mathbf{*}\rangle⟨ bold_1 , bold_0 , ∗ , ∗ ⟩, ⟨????,∗,∗,????⟩10\langle\mathbf{1},\mathbf{*},\mathbf{*},\mathbf{0}\rangle⟨ bold_1 , ∗ , ∗ , bold_0 ⟩, and ⟨????,????,∗,????⟩100\langle\mathbf{1},\mathbf{0},\mathbf{*},\mathbf{0}\rangle⟨ bold_1 , bold_0 , ∗ , bold_0 ⟩ fail the verification. This information is particularly useful when determining if a neuron is mandatory. For example, if we know that NAP P:=⟨????,????,∗,????⟩assign????101P:=\langle\mathbf{1},\mathbf{0},\mathbf{*},\mathbf{1}\rangleitalic_P := ⟨ bold_1 , bold_0 , ∗ , bold_1 ⟩ is a specification, i.e., ????(P)=1????????1\mathcal{V}(P)=1caligraphic_V ( italic_P ) = 1, then we can easily deduce that the fourth neuron N4,1subscript????41N_{4,1}italic_N start_POSTSUBSCRIPT 4 , 1 end_POSTSUBSCRIPT is mandatory. This is because that coarsening the fourth neuron would expand P????Pitalic_P to ⟨????,????,∗,∗⟩10\langle\mathbf{1},\mathbf{0},\mathbf{*},\mathbf{*}\rangle⟨ bold_1 , bold_0 , ∗ , ∗ ⟩, which would include the adversarial example x′superscript????′x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and thus fail the verification, as illustrated in Figure 5(a). It is evident that the neuron where P????Pitalic_P and ????¨(N,x′)¨????????superscript????′\ddot{\mathcal{A}}(N,x^{\prime})over¨ start_ARG caligraphic_A end_ARG ( italic_N , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) disagrees must be mandatory.
However, when the two NAPs disagree on multiple neurons, things become a little bit different. Suppose the NAP specification P????Pitalic_P is ⟨????,????,∗,????⟩111\langle\mathbf{1},\mathbf{1},\mathbf{*},\mathbf{1}\rangle⟨ bold_1 , bold_1 , ∗ , bold_1 ⟩, i.e., ????(⟨????,????,∗,????⟩)=1????\mathcal{V}(\langle\mathbf{1},\mathbf{1},\mathbf{*},\mathbf{1}\rangle)=1caligraphic_V ( ⟨ bold_1 , bold_1 , ∗ , bold_1 ⟩ ) = 1. We know ????(⟨????,????,∗,????⟩)=0????\mathcal{V}(\langle\mathbf{1},\mathbf{0},\mathbf{*},\mathbf{0}\rangle)=0caligraphic_V ( ⟨ bold_1 , bold_0 , ∗ , bold_0 ⟩ ) = 0 by the adversarial example x′superscript????′x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. In this case, if we coarsen the second and fourth neurons, N2,1subscript????21N_{2,1}italic_N start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT and N4,1subscript????41N_{4,1}italic_N start_POSTSUBSCRIPT 4 , 1 end_POSTSUBSCRIPT, the NAP specification will expand to ⟨????,∗,∗,∗⟩1\langle\mathbf{1},\mathbf{*},\mathbf{*},\mathbf{*}\rangle⟨ bold_1 , ∗ , ∗ , ∗ ⟩, which will cover the ⟨????,????,∗,????⟩100\langle\mathbf{1},\mathbf{0},\mathbf{*},\mathbf{0}\rangle⟨ bold_1 , bold_0 , ∗ , bold_0 ⟩, thus failing the verification. In this case, N2,1subscript????21N_{2,1}italic_N start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT and N4,1subscript????41N_{4,1}italic_N start_POSTSUBSCRIPT 4 , 1 end_POSTSUBSCRIPT could both be mandatory neurons or either one of them is mandatory, as illustrated in Figures 5(b), 5(c), 5(d). So, we simply let {N2,1,N4,1}subscript????21subscript????41\{N_{2,1},N_{4,1}\}{ italic_N start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT 4 , 1 end_POSTSUBSCRIPT } be the upper bound of mandatory neurons (learned from x′superscript????′x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT). Formally, given a NAP P????Pitalic_P, we say a neuron Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT is in the upper bound of mandatory neurons M????Mitalic_M if satisfies the following condition:
Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT must be in the binary states, i.e., Pi,l∈{????,????}subscript????????????01P_{i,l}\in\{\mathbf{0},\mathbf{1}\}italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ∈ { bold_0 , bold_1 }
There exists x′superscript????′x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that ????¨(Ni,l,x′)¨????subscript????????????superscript????′\ddot{\mathcal{A}}(N_{i,l},x^{\prime})over¨ start_ARG caligraphic_A end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) XORs with Pi,lsubscript????????????P_{i,l}italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT, i.e., ∃x′ such that ????¨(Ni,l,x′)⊕Pi,l=1direct-sumsuperscript????′ such that ¨????subscript????????????superscript????′subscript????????????1\exists x^{\prime}\text{ such that }\ddot{\mathcal{A}}(N_{i,l},x^{\prime})% \oplus P_{i,l}=1∃ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that over¨ start_ARG caligraphic_A end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⊕ italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT = 1
The field of adversarial attacks has been extensively studied, offering a wealth of methods that we can leverage. These approaches are usually computationally efficient, making it easy to access a large collection of adversarial examples. Therefore, we can compute the upper bound of mandatory neurons efficiently by simply taking the union of upper bounds learned for each adversarial example, as illustrated in Algorithm 3.
Zhaoyue are exported all over the world and different industries with quality first. Our belief is to provide our customers with more and better high value-added products. Let's create a better future together.
We introduce another approach called Gradient_Search to identify mandatory neurons. Similar to Adversarial_Prune, Gradient_Search avoids costly interactions with the verification tool. Instead, it leverages gradient estimations to analyze the structure of ????????\mathbf{F}bold_F.
Recall our definition of a neural network in Section 2.1, where the output ????(x)????????\mathbf{F}(x)bold_F ( italic_x ) can be seen as a function of the post-activation value of any internal neuron z^i(l)superscriptsubscript^????????????\hat{z}_{i}^{(l)}over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT. We denote this function as ????(z^(x))????^????????\mathbf{F}(\hat{z}(x))bold_F ( over^ start_ARG italic_z end_ARG ( italic_x ) ), omitting i????iitalic_i and l????litalic_l for simplicity. The function ????(z^)????^????\mathbf{F}(\hat{z})bold_F ( over^ start_ARG italic_z end_ARG ) is non-linear and operates within an input range of z^∈[0,+∞)^????0\hat{z}\in[0,+\infty)over^ start_ARG italic_z end_ARG ∈ [ 0 , + ∞ ). It is worth noting that the gradient ∂????∂z^????^????\frac{\partial\mathbf{F}}{\partial\hat{z}}divide start_ARG ∂ bold_F end_ARG start_ARG ∂ over^ start_ARG italic_z end_ARG end_ARG can provide valuable insights into the local structure of ????????\mathbf{F}bold_F. This insight may help determine the necessity condition of mandatory neurons by checking if there exists a z^^????\hat{z}over^ start_ARG italic_z end_ARG such that ????(z^)<0????^????0\mathbf{F}(\hat{z})<0bold_F ( over^ start_ARG italic_z end_ARG ) < 0 for the corresponding neuron. To be more specific, given some sampled data z^1,…,z^nsubscript^????1…subscript^????????\hat{z}_{1},\ldots,\hat{z}_{n}over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we can compute their corresponding function values ????(z^1),…,????(z^n)????subscript^????1…????subscript^????????\mathbf{F}(\hat{z}_{1}),\ldots,\mathbf{F}(\hat{z}_{n})bold_F ( over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , bold_F ( over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) along with their gradients ∂????∂z^|z^1,…,∂????∂z^|z^nevaluated-at????^????subscript^????1…evaluated-at????^????subscript^????????\frac{\partial\mathbf{F}}{\partial\hat{z}}|_{\hat{z}_{1}},\ldots,\frac{% \partial\mathbf{F}}{\partial\hat{z}}|_{\hat{z}_{n}}divide start_ARG ∂ bold_F end_ARG start_ARG ∂ over^ start_ARG italic_z end_ARG end_ARG | start_POSTSUBSCRIPT over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , divide start_ARG ∂ bold_F end_ARG start_ARG ∂ over^ start_ARG italic_z end_ARG end_ARG | start_POSTSUBSCRIPT over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Then if there exists a z^jsubscript^????????\hat{z}_{j}over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT such that |????(z^j)|????subscript^????????|\mathbf{F}(\hat{z}_{j})|| bold_F ( over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) | is sufficiently small, yet the norm of its gradient |∂????∂z^|z^j||\frac{\partial\mathbf{F}}{\partial\hat{z}}|_{\hat{z}_{j}}|| divide start_ARG ∂ bold_F end_ARG start_ARG ∂ over^ start_ARG italic_z end_ARG end_ARG | start_POSTSUBSCRIPT over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT | is significantly large, then it is very likely that ????????\mathbf{F}bold_F will go below zero nearby, as illustrated in Figure 6. Then, if we happen to know that the corresponding neuron Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT is in state ????0\mathbf{0}bold_0, it is highly likely that Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT is mandatory. Conversely, if a neuron Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT is in state ????1\mathbf{1}bold_1, and we sample an x????xitalic_x such that z^i(l)=0superscriptsubscript^????????????0\hat{z}_{i}^{(l)}=0over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = 0 and ????(x)<0????????0\mathbf{F}(x)<0bold_F ( italic_x ) < 0, we immediately recognize that the neuron is mandatory. Algorithm 4 presents pseudocode for estimating whether a neuron is mandatory based on the discussions above.
Both the estimation method Gradient_Search and Adversarial_Prune can be used together. Taking the intersection of their estimated results may provide us with a better overall estimate. In addition, the estimated mandatory neurons can serve as an initial starting point in our statistical versions of the Refine and Coarsen approaches introduced in Section 5.
Our experimental results regarding these two estimation approaches indicate a notable trend: neurons from deeper layers are more likely to be mandatory. This observation aligns with commonly-held beliefs that deeper neurons are generally linked to high-level feature representation, and thus play a more important role than shallow neurons in final classification decisions. This observation leads us to a valuable heuristic: prioritizing deeper neurons over shallower ones as we iterate through neurons in our implementations of the Refine and Coarsen algorithms.
In this section, we introduce two new algorithms that address the minimal NAP specification problem. While they share the same overarching concept as refine and coarsen, these methods employ sampling and statistical learning principles to efficiently learn a minimal NAP specification. Recall that the set of mandatory neurons M????Mitalic_M is the union of all neurons that appears in any minimal specification and is an upper bound of the largest minimal specification.
Mandatory neurons are crucial for forming NAP specifications, as their binary states play a critical role in determining the neural network’s robustness performance. We leverages this property to find mandatory neurons statistically. To be more specific, suppose we sample some NAPs P1,P2,…,Pnsubscript????1subscript????2…subscript????????P_{1},P_{2},\ldots,P_{n}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT from the NAP family ????????\mathcal{P}caligraphic_P. For those NAPs that qualify as specifications (i.e., ????(P)=1????????1\mathcal{V}(P)=1caligraphic_V ( italic_P ) = 1), mandatory neurons should appear more frequently in them than in those NAPs that fail the verification tool (i.e., ????(P)=0????????0\mathcal{V}(P)=0caligraphic_V ( italic_P ) = 0).
Based on this insight, we propose an approach called Sample_Refine that relies on non-repetitive sampling to identify mandatory neurons for solving the minimal NAP problem. We start with the coarsest NAP and iteratively collect the most probable mandatory neurons. In every iteration, we sample k????kitalic_k NAPs by refining unvisited neurons with some probability θ????\thetaitalic_θ. The sampled NAPs are then fed to the verification tool, and the neuron that appears most frequently in verifiable NAPs is the most probable mandatory neuron in this iteration. The neuron is marked as visited, and the process stops when either we collect s????sitalic_s neurons (assuming that s????sitalic_s is known), or the current mandatory neurons form a NAP specification P????Pitalic_P, i.e., ????(P)=1????????1\mathcal{V}(P)=1caligraphic_V ( italic_P ) = 1. Finally, we return the learned NAP, obtained by applying ????~~????\widetilde{\mathcal{A}}over~ start_ARG caligraphic_A end_ARG to the collected neurons. The algorithm 5 provides an overview of the above procedure.
It is worth noting that Sample_Refine doesn’t guarantee correctness, as we may end up collecting only the s????sitalic_s most probable mandatory neurons, which may not be sufficient to form a specification. Another concern regarding this algorithm is sampling efficiency, specifically the potential for the number of samples required to grow exponentially with the size of the minimal NAP specification s????sitalic_s. To understand why this is problematic, consider a scenario where the only minimal NAP specification P????Pitalic_P consists of all mandatory neurons; in this case, all |M|????|M|| italic_M | neurons must be selected for P????Pitalic_P to be learned. If θ????\thetaitalic_θ is set to a constant value, then the expected number of samples needed to obtain the NAP specification is (1θ)|M|superscript1????????(\frac{1}{\theta})^{|M|}( divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ) start_POSTSUPERSCRIPT | italic_M | end_POSTSUPERSCRIPT. To address this, we set θ=(|M||M|+1)|M|????superscript????????1????\theta=\left(\frac{|M|}{|M|+1}\right)^{|M|}italic_θ = ( divide start_ARG | italic_M | end_ARG start_ARG | italic_M | + 1 end_ARG ) start_POSTSUPERSCRIPT | italic_M | end_POSTSUPERSCRIPT. This choice ensures that the sampling efficiency is polynomial in both |M|????|M|| italic_M | and s????sitalic_s, as proven in Theorem 5.1. Please refer to the proof in the Appendix B. In addition, Theorem 5.1 also shows that with high probability, a mandatory neuron will be found with O(log|N|)????????????????????O(log|N|)italic_O ( italic_l italic_o italic_g | italic_N | ) calls to ????????\mathcal{V}caligraphic_V.
With probability θ=|(|M||M|+1)||M|????superscript????????1????\theta=|(\frac{|M|}{|M|+1})|^{|M|}italic_θ = | ( divide start_ARG | italic_M | end_ARG start_ARG | italic_M | + 1 end_ARG ) | start_POSTSUPERSCRIPT | italic_M | end_POSTSUPERSCRIPT, Sample_Refine has 1−δ1????1-\delta1 - italic_δ probability of outputting a minimal NAP specification with Θ(|M|2(log|N|+log(s/δ)))Θsuperscript????2????????????\Theta\left(|M|^{2}(\log|N|+\log(s/\delta))\right)roman_Θ ( | italic_M | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_log | italic_N | + roman_log ( italic_s / italic_δ ) ) ) examples each iteration and O(s|M|2(log|N|+log(s/δ)))????????superscript????2????????????O(s|M|^{2}(\log|N|+\log(s/\delta)))italic_O ( italic_s | italic_M | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_log | italic_N | + roman_log ( italic_s / italic_δ ) ) ) total calls to ????????\mathcal{V}caligraphic_V.
The Coarsen algorithm begins with the most refined NAP and progressively coarsens each neuron until the verification process fails. Enhancing the algorithm’s performance is possible by coarsening multiple neurons during each iteration. However, a fundamental question emerges: How do we determine which set of neurons to coarsen in each round?
We present Sample_Coarsen to answer this question. In this approach, we assume that each neuron is independent of the others and select neurons to coarsen in a statistical manner. Specifically, in each iteration, we randomly coarsen a subset of refined neurons in the current NAP simultaneously to see if the new NAP can pass verification. We repeat this process until the NAP size reaches s????sitalic_s. Algorithm 6 provides the pseudocode for Sample_Coarsen.
Similar to Sample_Refine, Sample_Coarsen also faces the same challenge related to sample efficiency. To better illustrate this, suppose a minimal NAP specification P????Pitalic_P with size s????sitalic_s can be found after one iteration of refinement. Then, the probability of selecting the exact number of s????sitalic_s mandatory neurons in P????Pitalic_P is θssuperscript????????\theta^{s}italic_θ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT. Consequently, if θ????\thetaitalic_θ is set to a constant value, the expected number of samples needed to find the NAP is (1θ)ssuperscript1????????(\frac{1}{\theta})^{s}( divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT. Unlike Sample_Refine, we know the refinement will always ensure the resulting NAP can pass verification. So, once such a NAP is learned, we can narrow down the estimated mandatory neurons by an expected factor of θ????\thetaitalic_θ. In this way, the expected number of samples and the expected number of iterations are inversely related, yet their product is the total number of calls to ????????\mathcal{V}caligraphic_V. We demonstrate that by setting θ=e−1s????superscript????1????\theta=e^{-\frac{1}{s}}italic_θ = italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_s end_ARG end_POSTSUPERSCRIPT, we not only make the expected number of samples polynomial in s????sitalic_s, but also minimize the total number of calls to ????????\mathcal{V}caligraphic_V, as proven in Theorem 5.2. Please refer to the proof in Appendix B.
With probability θ????\thetaitalic_θ=e−1ssuperscript????1????e^{-\frac{1}{s}}italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_s end_ARG end_POSTSUPERSCRIPT, Sample_Coarsen learns a minimal NAP specification with ????(slog|N|)????????????????????????\mathcal{O}(slog|N|)caligraphic_O ( italic_s italic_l italic_o italic_g | italic_N | ) calls to ????????\mathcal{V}caligraphic_V.
Setting s????sitalic_s poses a challenge in practice, as we assume that s????sitalic_s is always provided in Sample_Refine and Sample_Coarsen. However, this can be addressed by dynamically updating θ????\thetaitalic_θ based on the result of ????(P)????????\mathcal{V}(P)caligraphic_V ( italic_P ) (Liang et al., ). With θ????\thetaitalic_θ from theorem 5.2, Sample_Coarsen finds a NAP specification with probability (e−1/s)s=e−1superscriptsuperscript????1????????superscript????1\left(e^{-1/s}\right)^{s}=e^{-1}( italic_e start_POSTSUPERSCRIPT - 1 / italic_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT = italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Recall that theorem 5.1 state that Sample_Refine finds a NAP specification with probability (|M||M|+1)|M|superscript????????1????\left(\frac{|M|}{|M|+1}\right)^{|M|}( divide start_ARG | italic_M | end_ARG start_ARG | italic_M | + 1 end_ARG ) start_POSTSUPERSCRIPT | italic_M | end_POSTSUPERSCRIPT. The probability approaches e−1superscript????1e^{-1}italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT asymptotically. Thus, we aim to set θ????\thetaitalic_θ such that the Pr(????(P)=1)=e−1????????????????1superscript????1Pr(\mathcal{V}(P)=1)=e^{-1}italic_P italic_r ( caligraphic_V ( italic_P ) = 1 ) = italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Intuitively, if a sampled NAP P????Pitalic_P is a specification, we decrease θ????\thetaitalic_θ so less neurons will be refined or more neurons will be coarsened. Similarly, if P????Pitalic_P is not a specification, θ????\thetaitalic_θ needs to be increased.
Given that θ∈[0,1]????01\theta\in[0,1]italic_θ ∈ [ 0 , 1 ], we can parameterize it using the Sigmoid function σ(λ)=(1+e−λ)−1????????superscript1superscript????????1\sigma(\lambda)=\left(1+e^{-\lambda}\right)^{-1}italic_σ ( italic_λ ) = ( 1 + italic_e start_POSTSUPERSCRIPT - italic_λ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, where λ∈(−∞,∞)????\lambda\in(-\infty,\infty)italic_λ ∈ ( - ∞ , ∞ ). Since Pr(????(P)=1)????????????????1Pr(\mathcal{V}(P)=1)italic_P italic_r ( caligraphic_V ( italic_P ) = 1 ) depends on θ????\thetaitalic_θ as well, we express it as a function of λ????\lambdaitalic_λ, g(λ)=Pr(????(P)=1)????????????????????????1g(\lambda)=Pr(\mathcal{V}(P)=1)italic_g ( italic_λ ) = italic_P italic_r ( caligraphic_V ( italic_P ) = 1 ). Then, setting Pr(????(P)=1)=e−1????????????????1superscript????1Pr(\mathcal{V}(P)=1)=e^{-1}italic_P italic_r ( caligraphic_V ( italic_P ) = 1 ) = italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT can be achieved through the following minimization problem:
(14) L(λ)=12(g(λ)−e−1/s)2????????12superscript????????superscript????1????2L(\lambda)=\frac{1}{2}(g(\lambda)-e^{-1/s})^{2}italic_L ( italic_λ ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_g ( italic_λ ) - italic_e start_POSTSUPERSCRIPT - 1 / italic_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPTThe loss function L(λ)????????L(\lambda)italic_L ( italic_λ ) can be minimized by statistical learning using stochastic gradient descent. With a step size η????\etaitalic_η, update λ????\lambdaitalic_λ using λ←λ−ηdLdλ←????????????????????????????\lambda\leftarrow\lambda-\eta\frac{dL}{d\lambda}italic_λ ← italic_λ - italic_η divide start_ARG italic_d italic_L end_ARG start_ARG italic_d italic_λ end_ARG. Note that dLdλ????????????????\frac{dL}{d\lambda}divide start_ARG italic_d italic_L end_ARG start_ARG italic_d italic_λ end_ARG can be expressed as:
(15) dLdλ=(g(λ)−e−1/s)dg(λ)dλ????????????????????????superscript????1????????????????????????{\frac{dL}{d\lambda}}=(g(\lambda)-e^{-1/s})\frac{dg(\lambda)}{d\lambda}divide start_ARG italic_d italic_L end_ARG start_ARG italic_d italic_λ end_ARG = ( italic_g ( italic_λ ) - italic_e start_POSTSUPERSCRIPT - 1 / italic_s end_POSTSUPERSCRIPT ) divide start_ARG italic_d italic_g ( italic_λ ) end_ARG start_ARG italic_d italic_λ end_ARGGiven g(λ)=Pr(????(P)=1)????????????????????????1g(\lambda)=Pr(\mathcal{V}(P)=1)italic_g ( italic_λ ) = italic_P italic_r ( caligraphic_V ( italic_P ) = 1 ), we can replace g(λ)????????g(\lambda)italic_g ( italic_λ ) with ????(P)????????\mathcal{V}(P)caligraphic_V ( italic_P ) for stochastic gradient update. Additionally, since dg(λ)dλ>0????????????????????0\frac{dg(\lambda)}{d\lambda}>0divide start_ARG italic_d italic_g ( italic_λ ) end_ARG start_ARG italic_d italic_λ end_ARG > 0, we simply ignore it as its multiplication effect can be represented by η????\etaitalic_η. Therefore, the final update rule is given by:
(16) λ←λ−η(????(P)−e−1)←????????????????????superscript????1\lambda\leftarrow\lambda-\eta(\mathcal{V}(P)-e^{-1})italic_λ ← italic_λ - italic_η ( caligraphic_V ( italic_P ) - italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT )Conceptually, NAP specifications typically correspond to significantly larger input regions compared to local neighbourhood specifications. This serves as the primary motivation for utilizing NAPs as specifications. However, previous work lacks sufficient justification or evidence to support this claim. In this section, we propose a simple method for approximating the volume of RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT, i.e., the region corresponding to a NAP P????Pitalic_P. This allows us to: 1) quantify the size difference between RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT and L∞subscript????L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ball specifications; 2) gain insights into the volumetric change from the most refined NAP specification to the minimal NAP specification.
Computing the exact volume of RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT is at least NP-hard, as determining the exact volume of a polygon is known to be NP-hard (Dyer and Frieze, ). Moreover, computing the exact volume of RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT can be even more challenging due to its potential concavity. To this end, our method estimates the volume of RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT by efficient computation of an orthotope that closely aligns with RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT, as illustrated in Figure 7. We briefly describe it as follows:
The first step is to find an anchor point to serve as the center of the orthotope. Ideally, this anchor point should be positioned close to the center of RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT to ensure a significant overlap between the orthotope and RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT. However, computing the actual center of RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT is costly. Thus, we look for a pseudo-center from the training set X????Xitalic_X that resides in RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT. This pseudo-center can be computed by finding the point that uses the smallest L∞subscript????L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ball to cover other data points, solved as the following optimization problem:
cpseudo=argminx∈RPmaxx′∈RP‖x−x′‖∞subscript????pseudo????subscript????????subscriptsuperscript????′subscript????????subscriptnorm????superscript????′c_{\text{pseudo}}=\underset{x\in R_{P}}{\arg\min}\max_{x^{\prime}\in R_{P}}\|x% -x^{\prime}\|_{\infty}italic_c start_POSTSUBSCRIPT pseudo end_POSTSUBSCRIPT = start_UNDERACCENT italic_x ∈ italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_UNDERACCENT start_ARG roman_arg roman_min end_ARG roman_max start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_x - italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPTwhere RP={x∣????(N,x)≼P,x∈X}subscript????????conditional-set????formulae-sequenceprecedes-or-equals????????????????????????R_{P}=\{x\mid\mathcal{A}(N,x)\preccurlyeq P,x\in X\}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT = { italic_x ∣ caligraphic_A ( italic_N , italic_x ) ≼ italic_P , italic_x ∈ italic_X }. When |X|????|X|| italic_X | is small, cpseudosubscript????pseudoc_{\text{pseudo}}italic_c start_POSTSUBSCRIPT pseudo end_POSTSUBSCRIPT can be computed directly; for larger |X|????|X|| italic_X |, a statistical computation strategy is required.
Once the pseudo-center cpseudosubscript????pseudoc_{\text{pseudo}}italic_c start_POSTSUBSCRIPT pseudo end_POSTSUBSCRIPT is determined, we want to create an orthotope around cpseudosubscript????pseudoc_{\text{pseudo}}italic_c start_POSTSUBSCRIPT pseudo end_POSTSUBSCRIPT to closely align with RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT. The orthotope is constructed by determining pairs of upper and lower bounds U(i)superscript????????U^{(i)}italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT and L(i)superscript????????L^{(i)}italic_L start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT for each dimension i????iitalic_i. Specifically, U(i)superscript????????U^{(i)}italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT and L(i)superscript????????L^{(i)}italic_L start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT are computed through expansion in two opposite directions from cpseudosubscript????pseudoc_{\text{pseudo}}italic_c start_POSTSUBSCRIPT pseudo end_POSTSUBSCRIPT along dimension i????iitalic_i until they extend beyond RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT. This expansion can be expressed as:
maxU(i){x′∈RP|x′:=cpseudo+U(i)} ; maxL(i){x′∈RP|x′:=cpseudo−L(i)}subscriptsuperscript????????superscript????′conditionalsubscript????????superscript????′assignsubscript????pseudosuperscript???????? ; subscriptsuperscript????????superscript????′conditionalsubscript????????superscript????′assignsubscript????pseudosuperscript????????\max_{U^{(i)}}\{x^{\prime}\in R_{P}\,|\,x^{\prime}:=c_{\text{pseudo}}+U^{(i)}% \}\text{ ; }\max_{L^{(i)}}\{x^{\prime}\in R_{P}\,|\,x^{\prime}:=c_{\text{% pseudo}}-L^{(i)}\}roman_max start_POSTSUBSCRIPT italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := italic_c start_POSTSUBSCRIPT pseudo end_POSTSUBSCRIPT + italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT } ; roman_max start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := italic_c start_POSTSUBSCRIPT pseudo end_POSTSUBSCRIPT - italic_L start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT }Here, U(i)superscript????????U^{(i)}italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT and L(i)superscript????????L^{(i)}italic_L start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT represent the upper and lower bounds in dimension i????iitalic_i respectively, originating from cpseudosubscript????pseudoc_{\text{pseudo}}italic_c start_POSTSUBSCRIPT pseudo end_POSTSUBSCRIPT. These bounds can be efficiently calculated with binary search.
The choice of the archer point is crucial in our approach. If it is located at a corner of Rpsubscript????????R_{p}italic_R start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, the volume calculation will be highly biased. This can pose a problem when we seek to understand the volumetric change from the most refined NAP specification to the minimal NAP specification. Additionally, using the orthotope as an estimator provides convenience in understanding the volumetric change simply by examining differences in each input dimension.
In this section, we conduct a comprehensive evaluation of our algorithms for learning minimal NAP specifications across a range of benchmarks, spanning from a simple binary classifier to the state-of-the-art image classification model. To align with the underlying verification engine in the Experiment Setup, we evaluate NAP with ReLU as activation function. To illustrate the effectiveness of our approaches, we chose the method proposed in (Geng et al., ) as the baseline, denoted as the ????~~????\widetilde{\mathcal{A}}over~ start_ARG caligraphic_A end_ARG function. Our results suggest that minimal NAP specifications typically involve only a fraction of the neurons compared to the most refined NAPs (calculated using the ????~~????\widetilde{\mathcal{A}}over~ start_ARG caligraphic_A end_ARG function), yet they dramatically extend the verifiable bounds by several orders of magnitude.
All experiments in this section were conducted on an Ubuntu 20.04 LTS machine with 172 GB of RAM and an Intel(R) Xeon(R) Silver Processor. For verification, we utilized Marabou (Katz et al., ), a dedicated state-of-the-art neural network verifier. We configured a timeout of 10 minutes for each call to the verification tool. If the timeout is exceeded, the current neuron is retained in the minimal NAP specification even if its status cannot be determined.
We conduct our first experiment using a four-layer neural network as a binary classifier, where each layer consists of 32 neurons. This classifier is trained on the Wisconsin Breast Cancer (WBC) dataset (Wolberg et al., ), representing a decision-critical task where robustness is essential. Our trained model achieves a test set accuracy of 95.61%. We calculate the most refined (baseline) NAP specifications P~0superscript~????0\widetilde{P}^{0}over~ start_ARG italic_P end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and P~1superscript~????1\widetilde{P}^{1}over~ start_ARG italic_P end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT for labels 0 and 1 using the statistical abstraction function ????~~????\widetilde{\mathcal{A}}over~ start_ARG caligraphic_A end_ARG with a confidence ratio of δ=0.95????0.95\delta=0.95italic_δ = 0.95. The size of P~0superscript~????0\widetilde{P}^{0}over~ start_ARG italic_P end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and P~1superscript~????1\widetilde{P}^{1}over~ start_ARG italic_P end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT are 102 and 93, respectively. In contrast, the sizes of the minimal NAP specifications learned by the Coarsen algorithm for labels 0 and 1 are significantly reduced to 31 and 32, respectively. It is worth mentioning that our estimation approaches provide a fairly accurate estimate of the mandatory neurons, despite computing rather loose upper bounds. To be more specific, Adversarial_Prune and Gradient_Search compute 61 and 43 mandatory neurons for label 0, respectively. Together, they cover 25 out of the 31 mandatory neurons appearing in the minimal NAP specification for label 0. For label 1, Adversarial_Prune and Gradient_Search compute 54 and 39 mandatory neurons, respectively, covering 25 out of the 32 mandatory neurons appearing in the minimal NAP specification for label 1.
Regarding the statistical approaches, Sample_Refine computes NAP specifications of size 53 and 56 for label 0 and label 1, respectively. This is achieved while making 187 and 264 calls, respectively. Although these numbers are quite large, it is expected since it has to sample multiple trials in each iteration. In contrast, Sample_Coarsen is significantly more efficient. It can learn NAP specifications of size 42 and 45 for label 0 and label 1 using only 47 and 41 calls, respectively.
Recall that one of the main motivations for learning the minimal NAP specifications is their potential to verify larger input regions compared to refined NAP specifications. To support this, we compute the percentile of unseen test data that can be verified using these NAPs. Test data, sampled from the input space, serve as a proxy to understand the verifiable bounds of different NAPs. We find that the most refined NAP specifications P~0superscript~????0\widetilde{P}^{0}over~ start_ARG italic_P end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and P~1superscript~????1\widetilde{P}^{1}over~ start_ARG italic_P end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT cover 81.40% and 80.28% of test data for labels 0 and 1, respectively. In contrast, minimal NAP specifications cover 95.35% and 94.37% of test data for labels 0 and 1, respectively.
To intuitively understand the change in verifiable regions RPsubscript????????R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT from refined to minimal NAP specifications, we compare their estimated volumes. The increase in estimated volume is substantial: .35 times larger for label 0 and .53 times larger for label 1. Figure 8 illustrates the comparison of verifiable input ranges between the refined and minimal NAP specifications for labels 0 and 1, referencing the anchor point.
To show that our insights and approaches can be applied to more complicated datasets and networks, we conduct the second set of experiments using the mnistfc_256x4 model (VNNCOMP, ), a 4-layer fully connected network with 256 neurons per layer trained on the MNIST dataset. We specifically focus on classes 0, 1, and 4, as evaluations on other labels either encounter timeouts or fail verification. We compute the most refined NAP specifications P~0superscript~????0\widetilde{P}^{0}over~ start_ARG italic_P end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, P~1superscript~????1\widetilde{P}^{1}over~ start_ARG italic_P end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, and P~4superscript~????4\widetilde{P}^{4}over~ start_ARG italic_P end_ARG start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT for these labels with a confidence ratio of δ=0.99????0.99\delta=0.99italic_δ = 0.99. Their sizes are 751, 745, and 712, respectively, consistent with baseline results from previous work.
In contrast, the minimal NAP specifications learned by the Coarsen algorithm for labels 0, 1, and 4 are significantly reduced to 480, 491, and 506, respectively. Notably, our estimation approaches accurately identify mandatory neurons in these minimal NAPs. For example, for label 0, Adversarial_Prune and Gradient_Search find 618 and 195 mandatory neurons, respectively, and discover 445 and 160 of the 480 mandatory neurons in the minimal NAP specification.
In terms of statistical approaches, Sample_Coarsen outperforms Sample_Refine in both NAP specification size and number of calls to ????????\mathcal{V}caligraphic_V, requiring around 30 calls compared to over . It’s interesting to note that the mandatory neurons presented in the learned minimal NAP specifications using different algorithms are mostly located in the 3rd3rd3\textsuperscript{rd}3 and 4th4th4\textsuperscript{th}4 layers. This aligns with the belief that neurons in deeper layers are responsible for high-level feature representation and and thus play a critical role in making classification decisions.
Moreover, the learned minimal NAP specifications correspond to significantly larger verifiable regions compared to the refined NAP specifications. Using the percentile of test data as a metric, the minimal NAP specifications increase the coverage ratio from 80.51% to 98.78%, 85.11% to 98.59%, and 80.24% to 97.45% for labels 0, 1, and 4, respectively. From the perspective of estimated volume, the volumetric changes are on the order of 108superscript^{8}10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT times larger.
Current verification methods struggle to scale to state-of-the-art neural networks, making them unsuitable for critical systems like deep learning-based autonomous driving. Thus, our verification-dependent approaches also face scalability issues. We evaluate our learnt minimal specification using verification free approaches, an aspect not well-studied in the current literature.
Thus, for the third experiment, we choose a deep convolutional network, specifically a pretrained VGG-19 for ImageNet dataset, as the benchmark for estimating mandatory neurons. We select neurons from fully connected layers of VGG-19, which contains a total of neurons, to compute NAPs. The original ImageNet includes classes with millions of images. We narrow our focus to the top five largest classes in ImageNet. Each class consists of around training images and 350 test images. Table 3 presents the results of the learned NAPs using verification-free approaches.
We observe that these estimated NAPs cover significant portions of unseen test data. For example, the NAP formed by mandatory neurons learned through Gradient_Search can cover 87.01%percent87..01\%87.01 % of test data. This makes them a promising candidate for serving as NAP specifications, which tend to generalize well to unseen data drawn from the underlying distribution.
From the perspective of representation learning, neural networks acquire both low- and high-level feature extractors, which they use to make final classification decisions based on hidden features (neuron representations) (Bengio et al., ). Therefore, the robustness and consistency of a model’s predictions are influenced by the quality of these learned features. In essence, achieving an accurate and robust model hinges on learning ”good” hidden representations, which are characterized by better interpretability (Zhang et al., ). Many studies suggest a close relationship between visual interpretability and robustness, often observed in the learned features and representations (Alvarez Melis and Jaakkola, ; Boopathy et al., ; Dong et al., ). Thus, although we cannot yet formally verify the correctness of these estimated NAP specifications, we demonstrate that these NAPs are indeed ”meaningful” through visual interpretability—strong evidence that the estimated mandatory neurons (NAPs) contribute to the model’s robustness.
To this end, we employ Grad-CAM (Selvaraju et al., ), a popular approach from the model interpretability domain. This technique leverages the gradients of the classification score with respect to the final convolutional feature map to highlight the most important regions of an input image. Formally, the class-discriminative localization map (Grad-CAM map) LGrad-CAMc∈ℝu×vsuperscriptsubscript????Grad-CAM????superscriptℝ????????L_{\text{Grad-CAM}}^{c}\in\mathbb{R}^{u\times v}italic_L start_POSTSUBSCRIPT Grad-CAM end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_u × italic_v end_POSTSUPERSCRIPT of width u????uitalic_u and height v????vitalic_v for any class c????citalic_c can be computed by:
LGrad-CAMc=ReLU(∑kαc,kAk)superscriptsubscript????Grad-CAM????ReLUsubscript????subscript????????????subscript????????L_{\text{Grad-CAM}}^{c}=\text{ReLU}\left(\sum_{k}\alpha_{c,k}A_{k}\right)italic_L start_POSTSUBSCRIPT Grad-CAM end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT = ReLU ( ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )where Aksubscript????????A_{k}italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a feature map and αc,ksubscript????????????\alpha_{c,k}italic_α start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT is a weight which represents the significance of Aksubscript????????A_{k}italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for class c????citalic_c. αc,ksubscript????????????\alpha_{c,k}italic_α start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT can be computed by:
αc,k=1Z∑i∑j∂yc∂Ak,ijsubscript????????????1????subscript????subscript????subscript????????subscript????????????????\alpha_{c,k}=\frac{1}{Z}\sum_{i}\sum_{j}\frac{\partial y_{c}}{\partial A_{k,ij}}italic_α start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_Z end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT divide start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_A start_POSTSUBSCRIPT italic_k , italic_i italic_j end_POSTSUBSCRIPT end_ARGwhere ∂yc∂Aksubscript????????subscript????????\frac{\partial y_{c}}{\partial A_{k}}divide start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG represents the gradient of the score ycsubscript????????y_{c}italic_y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT for class c????citalic_c with respect to the activations Aksubscript????????A_{k}italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of a convolutional layer. These gradients are globally average-pooled to obtain αc,ksubscript????????????\alpha_{c,k}italic_α start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT. To investigate whether the estimated NAPs (mandatory neurons) are related to visual interpretability, we perform a simple modification to Grad-CAM: we mask out the neurons that do not appear in NAP P????Pitalic_P using a mask MPsubscript????????M_{P}italic_M start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT in the fully connected layers. We then calculate the backward gradient flow based on the modified computation graph, denoted as ∂yc∂Ak,ijMPsubscript????????subscript????????????????subscript????????\frac{\partial y_{c}}{\partial A_{k,ij}}M_{P}divide start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_A start_POSTSUBSCRIPT italic_k , italic_i italic_j end_POSTSUBSCRIPT end_ARG italic_M start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT. By replacing ∂yc∂Ak,ijsubscript????????subscript????????????????\frac{\partial y_{c}}{\partial A_{k,ij}}divide start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_A start_POSTSUBSCRIPT italic_k , italic_i italic_j end_POSTSUBSCRIPT end_ARG with ∂yc∂Ak,ijMPsubscript????????subscript????????????????subscript????????\frac{\partial y_{c}}{\partial A_{k,ij}}M_{P}divide start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_A start_POSTSUBSCRIPT italic_k , italic_i italic_j end_POSTSUBSCRIPT end_ARG italic_M start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT, we compute the modified Grad-CAM map. Finally, we conduct the following experiments on image samples:
Calculate the modified Grad-CAM map using the most refined (baseline) NAP and compare it with the original Grad-CAM map.
Calculate the modified Grad-CAM map using NAPs estimated by Adversarial_Prune and Gradient_Search, and compare it with the original map.
Calculate the modified Grad-CAM map using NAPs from different classes and compare it with the original map.
Figure 9 presents the experimental results. The original GradCams highlight important regions of images corresponding to crucial justifications for classification. Notably, both the most refined NAPs and the estimated minimal NAPs highlight regions nearly identical to the original GradCams. This indicates that although the neuron abstractions learned from our minimal specification only consist of a very small fraction of neurons, they preserve the essential visual features. This suggests that the estimated minimal NAPs capture salient traces of the internal decision-making process of VGG-19, aligning with the ”NAP robustness property.” Additionally, GradCAMs generated using NAPs from different classes highlighted distinct regions. This strongly suggests that our estimated NAPs are distinguishable from each other, aligning with ”the non-ambiguity property”.
This study demonstrates that NAPs hold significant value in interpretability. A small subset of neurons from NAPs can categorize critical internal dynamics of neural networks, potentially helping us unveil the black-box nature of these systems. From a machine-checkable definition perspective, concise NAPs are easier to decode into human-understandable programs than more refined NAPs. This underscores the importance of learning minimal NAPs. Interpreting NAPs into human-readable formats remains a direction for future research.
From a practical point of view, we believe that even before 1) formal verification finally scales; and/or 2) NAPs are fully interpretable, NAPs, as they are now, can already serve as an empirical certificate of a prediction or some defense mechanism, as shown in recent work (Lukina et al., ). In the same spirit, we demonstrate that our estimated essential neurons can serve as defense against adversarial attacks.
We first select images that meet two criteria: 1) correctly predicted by the model; and 2) covered by the respective NAP. On average, each NAP covers approximately 40% of the training data of the corresponding class. For each selected image, we generate 100 distinct adversarial examples that are misclassified by the model, using Projected Gradient Descent attack (Madry et al., ) and Carlini Wagner attack (Carlini and Wagner, ) respectively. We then check whether each adversarial image’s activation pattern is rejected by the respective NAPs. If so, we conclude that NAPs can empirically serve as specifications; Otherwise, they are proven to be not robust. Notably, we find that both the baseline NAPs and the estimated minimal NAPs reject all adversarial examples, indicating their effectiveness in describing a safe region and their potential as certificates.
Neural network verification has attracted much attention due to the increasingly widespread applications of neural networks in safety-critical systems. it’s NP-hard nature resulting from the non-convexity introduced by activation functions (Katz et al., ) makes it a challenging task. Thus most of the existing work on neural network verification focuses on designing scalable verification algorithms. For instance, while initially proposed solver-based approaches (Huang et al., b; Ehlers, ; Cheng et al., ; Tjeng et al., ) were limited to verify small neural networks with fewer than 100 neurons, state-of-the-art methods (Xu et al., ; Wang et al., c; Lu and Kumar, ) can verify more complex neural networks. It is worth mentioning that most existing work adopts local neighborhood specifications to verify the robustness properties of neural networks (Shapira et al., ). Despite being a reliable measure, using specifications that define local neighborhood of reference data points may not cover any test data, let alone generalizing to the verification of unseen test set data. Geng et al. () propose the new paradigm of NAP specifications to address this challenge. Our work advances the understanding of NAP specifications.
Abstract interpretation (Cousot and Cousot, ) is a fundamental concept in software analysis and verification, particularly for approximating the semantics of discrete program states. By sacrificing precision, abstract interpretation typically enables scalable and faster proof finding during verification (Cousot and Cousot, ). Although abstract interpretation for neural network verification has been proposed and studied in previous literature (Gehr et al., ; Mirman et al., ), abstract interpretation of neural activation patterns for verification is a relatively new field. Perhaps the most related work from the perspective of abstract interpretation is learning minimal abstractions (Liang et al., ). While our work shares similarities in problem formulation and statistical approaches, we address fundamentally different problems. One limitation in our work is that our abstraction states may be too coarse: value in range (0,+∞)0(0,+\infty)( 0 , + ∞ ) is abstracted into one state. This approach could over-approximate neuron behavior and thus fail to prove certain properties. We observe that neuron values exhibit different patterns in range for different input classes, suggesting the potential existence of more abstraction states. We leave this as future work.
Neural activation patterns have commonly been used to understand the internal decision-making process of neural networks. One popular line of research is feature visualization (Yosinski et al., ; Bäuerle et al., ), which investigates which neurons are activated or deactivated given different inputs. This is also naturally related to the field of activation maximization (Simonyan et al., ), which studies what kind of inputs could mostly activate certain neurons in the neural network. In this way, certain prediction outcomes may be attributed to the behavior of specific neurons, thereby increasing the interpretability of the underlying models. Lukina et al. () demonstrates that neural activation patterns can be used to monitor neural networks and detect novel or unknown input classes at runtime. They provide human-level interpretability of neural network decision-making. In summary, most of existing works focus on learning statistical correlations between NAPs and inputs (Bau et al., ; Erhan et al., ), or between NAPs and prediction outcomes (Lukina et al., ). However, these correlations raises questions which we address in this paper: whether the correlation can be trusted or even verified. We propose the concept of mandatory neurons and highlight their importance in the robustness of model predictions. Such causal links between neurons and prediction outcomes are not only identified but also verified. We believe this ”identify then verify” paradigm can be extended to existing research on NAPs to certify our understanding of neural networks. We leave the exploration of this direction for our future work.
We introduce a new problem — learning the minimal NAP specification, and discuss its importance in neural network verification. Finding minimal NAP specification not only enables the verification of larger input regions compared to existing methods but also provides a means of inspecting when and how neural networks can make reliable and robust predictions. To solve this problem, we first propose two simple approaches Refine and Coarsen, which leverage off-the-shelf verification tools that find the minimal NAP specification with correctness guarantees. We also propose the statistical version of Refine and Coarsen, that combine sampling and statistical learning principles to achieve correctness in an more efficient manner. However, these approaches depend on underlying verification tools that are computationally expensive. To this end, we propose two approximate approaches, Adversarial_Prune and Gradient_Search. These methods utilize adversarial attacks and local gradient computation to efficiently investigate potential causal relationships between specific neurons and the model’s robustness. Finally, to appreciate the volumetric change from the most refined NAPs to the minimal NAPs, we propose a simple method for estimating the volume of the region corresponding to a NAP. Our experiments indicate that minimal NAP specifications utilize much smaller fractions of neurons compared to the most refined NAPs. Nevertheless, they enable a substantial expansion of the verifiable boundaries by several orders of magnitude.
The algorithm Refine returns a minimal NAP specification with ????(2|N|)????superscript2????\mathcal{O}(2^{|N|})caligraphic_O ( 2 start_POSTSUPERSCRIPT | italic_N | end_POSTSUPERSCRIPT ) calls to ????????\mathcal{V}caligraphic_V.
Let P????Pitalic_P be the returned NAP. We prove this by contradiction. Suppose we can further refine P????Pitalic_P, meaning there exists a NAP P′superscript????′P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that |P′|≤|P|superscript????′????|P^{\prime}|\leq|P|| italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ≤ | italic_P | and ????(P′)=1????superscript????′1\mathcal{V}(P^{\prime})=1caligraphic_V ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = 1. However, the algorithm states that any P′superscript????′P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT smaller than |P|????|P|| italic_P | fails verification, which contradicts ????(P′)=1????superscript????′1\mathcal{V}(P^{\prime})=1caligraphic_V ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = 1.
In the worst case, the NAP size k????kitalic_k runs up to |N|????|N|| italic_N |. For each k????kitalic_k, we need to check (|N|k)binomial????????\binom{|N|}{k}( FRACOP start_ARG | italic_N | end_ARG start_ARG italic_k end_ARG ) number of NAPs. In total, this number of NAPs we need to check is (|N|1)+(|N|2)+⋯+(|N||N|)=2|N|−1binomial????1binomial????2⋯binomial????????superscript2????1\binom{|N|}{1}+\binom{|N|}{2}+\cdots+\binom{|N|}{|N|}=2^{|N|}-1( FRACOP start_ARG | italic_N | end_ARG start_ARG 1 end_ARG ) + ( FRACOP start_ARG | italic_N | end_ARG start_ARG 2 end_ARG ) + ⋯ + ( FRACOP start_ARG | italic_N | end_ARG start_ARG | italic_N | end_ARG ) = 2 start_POSTSUPERSCRIPT | italic_N | end_POSTSUPERSCRIPT - 1 according to the binomial theorem, resulting in a runtime complexity of ????(2|N|)????superscript2????\mathcal{O}(2^{|N|})caligraphic_O ( 2 start_POSTSUPERSCRIPT | italic_N | end_POSTSUPERSCRIPT ).
∎
Let P????Pitalic_P be the NAP returned by Coarsen. Our goal is to show that any P′superscript????′P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT smaller than P????Pitalic_P results in ????(P′)=0????superscript????′0\mathcal{V}(P^{\prime})=0caligraphic_V ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = 0. To construct such a smaller P′superscript????′P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we need to apply the refine action Δ~~Δ\widetilde{\Delta}over~ start_ARG roman_Δ end_ARG on P????Pitalic_P through some neuron Ni,lsubscript????????????N_{i,l}italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT, i.e., P′:=Δ~(Ni,l)=????~(Ni,l)∪P∖Pi,lassignsuperscript????′~Δsubscript????????????~????subscript????????????????subscript????????????P^{\prime}:=\widetilde{\Delta}(N_{i,l})=\widetilde{\mathcal{A}}(N_{i,l})\cup P% \setminus P_{i,l}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := over~ start_ARG roman_Δ end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ) = over~ start_ARG caligraphic_A end_ARG ( italic_N start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT ) ∪ italic_P ∖ italic_P start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT. According to the algorithm, ????(P′)=0????superscript????′0\mathcal{V}(P^{\prime})=0caligraphic_V ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = 0. In the worst case, the algorithm needs to iterate through each neuron in N????Nitalic_N, resulting in a runtime complexity of ????(|N|)????????\mathcal{O}(|N|)caligraphic_O ( | italic_N | ). ∎
Our proofs of properties of Sample_Refine and Sample_Coarsen mainly follow those in (Liang et al., ). Interested readers may refer to it for detailed proofs.
With probability θ=|(|M||M|+1)||M|????superscript????????1????\theta=|(\frac{|M|}{|M|+1})|^{|M|}italic_θ = | ( divide start_ARG | italic_M | end_ARG start_ARG | italic_M | + 1 end_ARG ) | start_POSTSUPERSCRIPT | italic_M | end_POSTSUPERSCRIPT, Sample_Refine has 1−δ1????1-\delta1 - italic_δ probability of outputting a minimal NAP specification with Θ(|M|2(log|N|+log(s/δ)))Θsuperscript????2????????????\Theta\left(|M|^{2}(\log|N|+\log(s/\delta))\right)roman_Θ ( | italic_M | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_log | italic_N | + roman_log ( italic_s / italic_δ ) ) ) examples each iteration and O(s|M|2(log|N|+log(s/δ)))????????superscript????2????????????O(s|M|^{2}(\log|N|+\log(s/\delta)))italic_O ( italic_s | italic_M | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_log | italic_N | + roman_log ( italic_s / italic_δ ) ) ) total calls to ????????\mathcal{V}caligraphic_V.
Considering the size of the minimal specification s????sitalic_s, Sample_Refine will execute s????sitalic_s iterations. If we sample k????kitalic_k times in each iteration, then the probability of selecting a mandatory neuron is at least 1−δs1????????1-\frac{\delta}{s}1 - divide start_ARG italic_δ end_ARG start_ARG italic_s end_ARG. Consequently, by applying a union bound, the algorithm will identify a NAP specification with a probability of at least 1−δ1????1-\delta1 - italic_δ.
Now, let’s delve deeper into one iteration of the process. The fundamental concept is that a mandatory neuron m????mitalic_m demonstrates a stronger correlation with proving the robustness query (????(P)=1????????1\mathcal{V}(P)=1caligraphic_V ( italic_P ) = 1) compared to a non-mandatory one. This enhanced correlation increases the probability of its selection significantly when k????kitalic_k is sufficiently large.
Let’s revisit the notion of M????Mitalic_M, representing the set of mandatory neurons with a size of |M|????|M|| italic_M |. Now, let’s focus on a specific mandatory neuron n+∈Msuperscript????????n^{+}\in Mitalic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∈ italic_M. We define Bj−subscript????limit-from????B_{j-}italic_B start_POSTSUBSCRIPT italic_j - end_POSTSUBSCRIPT as the event indicating kn−>kn+subscript????superscript????subscript????superscript????k_{n^{-}}>k_{n^{+}}italic_k start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT > italic_k start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, and B????Bitalic_B as the event where Bj−subscript????limit-from????B_{j-}italic_B start_POSTSUBSCRIPT italic_j - end_POSTSUBSCRIPT holds for any non-mandatory neuron n−∈N∖Msuperscript????????????n^{-}\in N\setminus Mitalic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ∈ italic_N ∖ italic_M. Importantly, if B????Bitalic_B fails to occur, then the algorithm will correctly identify a mandatory neuron. Hence, our primary objective is to establish that Pr(B)≤δs????????????????????Pr(B)\leq\frac{\delta}{s}italic_P italic_r ( italic_B ) ≤ divide start_ARG italic_δ end_ARG start_ARG italic_s end_ARG. Initially, employing a union bound provides:
(17) Pr(B)≤∑n−Pr(Bn−)≤|N|maxn−Pr(Bn−)????????????subscriptsuperscript????????????subscript????superscript????????subscriptsuperscript????????????subscript????superscript????\displaystyle Pr(B)\leq\sum_{n^{-}}Pr(B_{n^{-}})\leq|N|\max_{n^{-}}Pr(B_{n^{-}})italic_P italic_r ( italic_B ) ≤ ∑ start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P italic_r ( italic_B start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ≤ | italic_N | roman_max start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P italic_r ( italic_B start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT )Let’s delve into each training example P(i)superscript????????P^{(i)}italic_P start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT and introduce the notation Xi=(1−????(P(i)))(Pn−(i)−Pn+(i))subscript????????1????superscript????????superscriptsubscript????superscript????????superscriptsubscript????superscript????????X_{i}=(1-\mathcal{V}(P^{(i)}))(P_{n^{-}}^{(i)}-P_{n^{+}}^{(i)})italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( 1 - caligraphic_V ( italic_P start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ) ( italic_P start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT - italic_P start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ). It’s worth emphasizing that Bn−subscript????superscript????B_{n^{-}}italic_B start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT manifests precisely when 1n(kn−−kn+)=1n∑i=1nXi>01????subscript????superscript????subscript????superscript????1????superscriptsubscript????1????subscript????????0\frac{1}{n}(k_{n^{-}}-k_{n^{+}})=\frac{1}{n}\sum_{i=1}^{n}X_{i}>0divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( italic_k start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_k start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0. Our objective now is to bound this quantity utilizing Hoeffding’s inequality, considering the mean as:
(18) ????[Xi]=Pr(????(P)=1,Pn−∈{????,????})−Pr(????(P)=1,Pn+∈{????,????})????delimited-[]subscript????????????????formulae-sequence????????1subscript????superscript????10????????formulae-sequence????????1subscript????superscript????10\displaystyle\mathbb{E}[X_{i}]=Pr(\mathcal{V}(P)=1,P_{n^{-}}\in\{\mathbf{1},% \mathbf{0}\})-Pr(\mathcal{V}(P)=1,P_{n^{+}}\in\{\mathbf{1},\mathbf{0}\})blackboard_E [ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_P italic_r ( caligraphic_V ( italic_P ) = 1 , italic_P start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∈ { bold_1 , bold_0 } ) - italic_P italic_r ( caligraphic_V ( italic_P ) = 1 , italic_P start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∈ { bold_1 , bold_0 } )and the bounds are −1≤????[Xi]≤11????delimited-[]subscript????????1-1\leq\mathbb{E}[X_{i}]\leq 1- 1 ≤ blackboard_E [ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ≤ 1 . Setting ϵ=−????[Xi]italic-ϵ????delimited-[]subscript????????\epsilon=-\mathbb{E}[X_{i}]italic_ϵ = - blackboard_E [ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ], we get:
(19) Pr(Bn−)≤e−kϵ22,n+∈M,n−∉M.formulae-sequence????????subscript????superscript????superscript????????superscriptitalic-ϵ22formulae-sequencesuperscript????????superscript????????\displaystyle Pr(B_{n^{-}})\leq e^{\frac{-k\epsilon^{2}}{2}},n^{+}\in M,n^{-}% \notin M.italic_P italic_r ( italic_B start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ≤ italic_e start_POSTSUPERSCRIPT divide start_ARG - italic_k italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∈ italic_M , italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ∉ italic_M .Substituting (16) into (15) and rearranging terms, we can solve for k????kitalic_k:
(20) δs≤|N|e−kϵ22 implies k≥2(log|N|+log(sδ))ϵ2????????????superscript????????superscriptitalic-ϵ22 implies ????2????????????????????????????????????superscriptitalic-ϵ2\displaystyle\frac{\delta}{s}\leq|N|e^{\frac{-k\epsilon^{2}}{2}}\text{ implies% }k\geq\frac{2(log|N|+log(\frac{s}{\delta}))}{\epsilon^{2}}divide start_ARG italic_δ end_ARG start_ARG italic_s end_ARG ≤ | italic_N | italic_e start_POSTSUPERSCRIPT divide start_ARG - italic_k italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT implies italic_k ≥ divide start_ARG 2 ( italic_l italic_o italic_g | italic_N | + italic_l italic_o italic_g ( divide start_ARG italic_s end_ARG start_ARG italic_δ end_ARG ) ) end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARGOur attention now shifts towards deriving a lower bound for ϵitalic-ϵ\epsilonitalic_ϵ, which intuitively reflects the discrepancy (in terms of correlation with proving the robustness query) between a mandatory neuron and a non-mandatory one. It’s noteworthy that Pr(Pn∈{????,????})=θ????????subscript????????10????Pr(P_{n}\in{\{\mathbf{1},\mathbf{0}}\})=\thetaitalic_P italic_r ( italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ { bold_1 , bold_0 } ) = italic_θ for any n∈N????????n\in Nitalic_n ∈ italic_N. Furthermore, since n−superscript????n^{-}italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT is non-mandatory, we can assert that Pr(????(P)=1|Pn−∈{????,????})=Pr(????(P)=1)????????????????conditional1subscript????superscript????10????????????????1Pr(\mathcal{V}(P)=1|P_{n^{-}}\in\{\mathbf{1},\mathbf{0}\})=Pr(\mathcal{V}(P)=1)italic_P italic_r ( caligraphic_V ( italic_P ) = 1 | italic_P start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∈ { bold_1 , bold_0 } ) = italic_P italic_r ( caligraphic_V ( italic_P ) = 1 ). By leveraging these observations, we can express:
(21) ϵ=θ(Pr(????(P)=1|Pn+∈{????,????})−Pr(????(P)=1))italic-ϵ????????????????????conditional1subscript????superscript????10????????????????1\displaystyle\epsilon=\theta(Pr(\mathcal{V}(P)=1|P_{n^{+}}\in\{\mathbf{1},% \mathbf{0}\})-Pr(\mathcal{V}(P)=1))italic_ϵ = italic_θ ( italic_P italic_r ( caligraphic_V ( italic_P ) = 1 | italic_P start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∈ { bold_1 , bold_0 } ) - italic_P italic_r ( caligraphic_V ( italic_P ) = 1 ) )Let’s view C????Citalic_C as the collection of minimal NAP specifications. We can treat C????Citalic_C as a set of clauses in a Disjunctive Normal Form (DNF) formula: ????(P;C)=¬⋁c∈C⋀n∈cPn????????????subscript????????subscript????????subscript????????\mathcal{V}(P;C)=\neg\bigvee_{c\in C}\bigwedge_{n\in c}P_{n}caligraphic_V ( italic_P ; italic_C ) = ¬ ⋁ start_POSTSUBSCRIPT italic_c ∈ italic_C end_POSTSUBSCRIPT ⋀ start_POSTSUBSCRIPT italic_n ∈ italic_c end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, where we explicitly specify the dependence of ????????\mathcal{V}caligraphic_V on the clauses C????Citalic_C. For example, if C=1,2,3????123C={{1,2},{3}}italic_C = 1 , 2 , 3, it corresponds to ????(P)=¬[(P1∧P2)∨P3]????????delimited-[]subscript????1subscript????2subscript????3\mathcal{V}(P)=\neg[(P_{1}\land P_{2})\lor P_{3}]caligraphic_V ( italic_P ) = ¬ [ ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ]. Now, let Cj=c∈C:n∈c:subscript????????????????????????C_{j}={c\in C:n\in c}italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_c ∈ italic_C : italic_n ∈ italic_c denote the clauses containing n????nitalic_n. We reformulate Pr(????(P)=1)????????????????1Pr(\mathcal{V}(P)=1)italic_P italic_r ( caligraphic_V ( italic_P ) = 1 ) as the sum of two components: one originating from the mandatory neuron n+superscript????n^{+}italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and the other from the non-mandatory neuron:
(22) Pr(????(P)=1)????????????????1\displaystyle Pr(\mathcal{V}(P)=1)italic_P italic_r ( caligraphic_V ( italic_P ) = 1 ) =Pr(????(P;Cn+)=1,????(P;C∖Cn+)=0)absent????????formulae-sequence????????subscript????superscript????1????????????subscript????superscript????0\displaystyle=Pr(\mathcal{V}(P;C_{n^{+}})=1,\mathcal{V}(P;C\setminus C_{n^{+}}% )=0)= italic_P italic_r ( caligraphic_V ( italic_P ; italic_C start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = 1 , caligraphic_V ( italic_P ; italic_C ∖ italic_C start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = 0 ) (23) +Pr(????(P;C∖Cn+)=1)????????????????????subscript????superscript????1\displaystyle\quad+Pr(\mathcal{V}(P;C\setminus C_{n^{+}})=1)+ italic_P italic_r ( caligraphic_V ( italic_P ; italic_C ∖ italic_C start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = 1 )Calculating Pr(????(P)=1|Pn+∈{????,????})????????????????conditional1subscript????superscript????10Pr(\mathcal{V}(P)=1|P_{n^{+}}\in\{\mathbf{1},\mathbf{0}\})italic_P italic_r ( caligraphic_V ( italic_P ) = 1 | italic_P start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∈ { bold_1 , bold_0 } ) follows a similar process. The only distinction arises from conditioning on Pn+∈{????,????}subscript????superscript????10P_{n^{+}}\in\{\mathbf{1},\mathbf{0}\}italic_P start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∈ { bold_1 , bold_0 }, which introduces an extra factor of 1θ1????\frac{1}{\theta}divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG in the first term because conditioning divides by Pr(Pn+)=θ????????subscript????superscript????????Pr(P_{n^{+}})=\thetaitalic_P italic_r ( italic_P start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = italic_θ. The second term remains unchanged since no c∉Cn+????subscript????superscript????c\notin C_{n^{+}}italic_c ∉ italic_C start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is mandatory in Pn+subscript????superscript????P_{n^{+}}italic_P start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Substituting these two outcomes back into the equation yields:
(24) ϵ=(1−θ)Pr(????(P;Cn+)=1,????(P;C∖Cn+)=0)italic-ϵ1????????????formulae-sequence????????subscript????superscript????1????????????subscript????superscript????0\displaystyle\epsilon=(1-\theta)Pr(\mathcal{V}(P;C_{n^{+}})=1,\mathcal{V}(P;C% \setminus C_{n^{+}})=0)italic_ϵ = ( 1 - italic_θ ) italic_P italic_r ( caligraphic_V ( italic_P ; italic_C start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = 1 , caligraphic_V ( italic_P ; italic_C ∖ italic_C start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = 0 )Now, our objective is to establish a lower bound for (22) across all possible ????????\mathcal{V}caligraphic_V (equivalently, C????Citalic_C), where n+superscript????n^{+}italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT is permitted to be mandatory in C????Citalic_C. Interestingly, the worst possible C????Citalic_C can be obtained by either having |M|????|M|| italic_M | disjoint clauses (C=n:n∈M:????????????????C={{n}:n\in M}italic_C = italic_n : italic_n ∈ italic_M) or a single clause (C=M????????C={M}italic_C = italic_M if s=|M|????????s=|M|italic_s = | italic_M |). The intuition behind this is that when C????Citalic_C consists of |M|????|M|| italic_M | clauses, there are numerous possibilities (|M|−1????1|M|-1| italic_M | - 1 of them) for some c∉Cn+????subscript????superscript????c\notin C_{n^{+}}italic_c ∉ italic_C start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT to be true, making it challenging to determine that n+superscript????n^{+}italic_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT is a mandatory neuron; in such cases, ϵ=(1−θ)θ(1−θ)|M|−1italic-ϵ1????????superscript1????????1\epsilon=(1-\theta)\theta(1-\theta)^{|M|-1}italic_ϵ = ( 1 - italic_θ ) italic_θ ( 1 - italic_θ ) start_POSTSUPERSCRIPT | italic_M | - 1 end_POSTSUPERSCRIPT. Conversely, if C????Citalic_C comprises a single clause, then letting this clause be true becomes exceedingly challenging; in this scenario, ϵ=(1−θ)θ|M|italic-ϵ1????superscript????????\epsilon=(1-\theta)\theta^{|M|}italic_ϵ = ( 1 - italic_θ ) italic_θ start_POSTSUPERSCRIPT | italic_M | end_POSTSUPERSCRIPT.
Let’s consider the scenario where C????Citalic_C comprises |M|????|M|| italic_M | clauses. We aim to maximize ϵitalic-ϵ\epsilonitalic_ϵ concerning θ????\thetaitalic_θ by setting the derivative dϵdθ=0????italic-ϵ????????0\frac{d\epsilon}{d\theta}=0divide start_ARG italic_d italic_ϵ end_ARG start_ARG italic_d italic_θ end_ARG = 0 and solving for θ????\thetaitalic_θ. This optimization yields θ=1|M|+1????1????1\theta=\frac{1}{|M|+1}italic_θ = divide start_ARG 1 end_ARG start_ARG | italic_M | + 1 end_ARG as the optimal value. Substituting this value into the formula of ϵitalic-ϵ\epsilonitalic_ϵ, we obtain ϵ=1|M|+1(|M||M|+1)|M|italic-ϵ1????1superscript????????1????\epsilon=\frac{1}{|M|+1}(\frac{|M|}{|M|+1})^{|M|}italic_ϵ = divide start_ARG 1 end_ARG start_ARG | italic_M | + 1 end_ARG ( divide start_ARG | italic_M | end_ARG start_ARG | italic_M | + 1 end_ARG ) start_POSTSUPERSCRIPT | italic_M | end_POSTSUPERSCRIPT. Note that (|M||M|+1)|M|superscript????????1????(\frac{|M|}{|M|+1})^{|M|}( divide start_ARG | italic_M | end_ARG start_ARG | italic_M | + 1 end_ARG ) start_POSTSUPERSCRIPT | italic_M | end_POSTSUPERSCRIPT can be lower bounded by e−1superscript????1e^{-1}italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, implying that ϵ−2=????(|M|2)superscriptitalic-ϵ2????superscript????2\epsilon^{-2}=\mathcal{O}(|M|^{2})italic_ϵ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT = caligraphic_O ( | italic_M | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Substituting this result to equation (18) will conclude the proof. ∎
With probability θ????\thetaitalic_θ=e−1ssuperscript????1????e^{-\frac{1}{s}}italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_s end_ARG end_POSTSUPERSCRIPT, Sample_Coarsen learns a minimal NAP specification with ????(slog|N|)????????????????????????\mathcal{O}(slog|N|)caligraphic_O ( italic_s italic_l italic_o italic_g | italic_N | ) calls to ????????\mathcal{V}caligraphic_V.
Let’s first estimate the number of calls that Sample_Coarsen makes to ????????\mathcal{V}caligraphic_V. We denote the number of calls as ????(PL)????superscript????????\mathcal{C}(P^{L})caligraphic_C ( italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ), where PLsuperscript????????P^{L}italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT is the most refined NAP. Then ????(PL)????superscript????????\mathcal{C}(P^{L})caligraphic_C ( italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) can be computed recursively using the following rule:
(25) ????(PL)={|P| if |PL|≤s+11+????[(1−????(P))????(P)+????(P)????(PL)] otherwise????superscript????????cases???? if superscript????????????11????delimited-[]1????????????????????????????superscript???????? otherwise\displaystyle\mathcal{C}(P^{L})=\begin{cases}|P|&\text{ if }|P^{L}|\leq s+1\\ 1+\mathbb{E}[(1-\mathcal{V}(P))\mathcal{C}(P)+\mathcal{V}(P)\mathcal{C}(P^{L})% ]&\text{ otherwise }\\ \end{cases}caligraphic_C ( italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) = { start_ROW start_CELL | italic_P | end_CELL start_CELL if | italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT | ≤ italic_s + 1 end_CELL end_ROW start_ROW start_CELL 1 + blackboard_E [ ( 1 - caligraphic_V ( italic_P ) ) caligraphic_C ( italic_P ) + caligraphic_V ( italic_P ) caligraphic_C ( italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) ] end_CELL start_CELL otherwise end_CELL end_ROWwhere P????Pitalic_P is the sampled NAP. By assumption, there exists a NAP PS≼Pprecedes-or-equalssuperscript????????????P^{S}\preccurlyeq Pitalic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ≼ italic_P of size s????sitalic_s that passes the verification. Define G(P)=¬(PS≼P)????????precedes-or-equalssuperscript????????????G(P)=\neg(P^{S}\preccurlyeq P)italic_G ( italic_P ) = ¬ ( italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ≼ italic_P ), which is 0 when PSsuperscript????????P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT subsumes P????Pitalic_P, i.e., all mandatory neurons in PSsuperscript????????P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT shows up in the sampled P????Pitalic_P. This follows that Pr(G(P)=0)=Pr(PS≼P)=θs????????????????0????????precedes-or-equalssuperscript????????????superscript????????Pr(G(P)=0)=Pr(P^{S}\preccurlyeq P)=\theta^{s}italic_P italic_r ( italic_G ( italic_P ) = 0 ) = italic_P italic_r ( italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ≼ italic_P ) = italic_θ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT. Note that G(P)≥????(P)????????????????G(P)\geq\mathcal{V}(P)italic_G ( italic_P ) ≥ caligraphic_V ( italic_P ), as the NAP PSsuperscript????????P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT suffices to prove the robustness query. We can estimate the upper bound of ????(PL)????superscript????????\mathcal{C}(P^{L})caligraphic_C ( italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) by replacing ????????\mathcal{V}caligraphic_V with G????Gitalic_G:
(26) ????(PL)????superscript????????\displaystyle\mathcal{C}(P^{L})caligraphic_C ( italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) ≤1+????[(1−G(P))????(P)+G(P)????(PL)]absent1????delimited-[]1????????????????????????????superscript????????\displaystyle\leq 1+\mathbb{E}[(1-G(P))\mathcal{C}(P)+G(P)\mathcal{C}(P^{L})]≤ 1 + blackboard_E [ ( 1 - italic_G ( italic_P ) ) caligraphic_C ( italic_P ) + italic_G ( italic_P ) caligraphic_C ( italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) ] (27) ≤1+θs????[????(P)|PS≼P]+(1−θs)????(PL)absent1superscript????????????delimited-[]precedes-or-equalsconditional????????superscript????????????1superscript????????????superscript????????\displaystyle\leq 1+\theta^{s}\mathbb{E}[\mathcal{C}(P)|P^{S}\preccurlyeq P]+(% 1-\theta^{s})\mathcal{C}(P^{L})≤ 1 + italic_θ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT blackboard_E [ caligraphic_C ( italic_P ) | italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ≼ italic_P ] + ( 1 - italic_θ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) caligraphic_C ( italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) (28) ≤????[????(P)|PS≼P]+θ−sabsent????delimited-[]precedes-or-equalsconditional????????superscript????????????superscript????????\displaystyle\leq\mathbb{E}[\mathcal{C}(P)|P^{S}\preccurlyeq P]+\theta^{-s}≤ blackboard_E [ caligraphic_C ( italic_P ) | italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ≼ italic_P ] + italic_θ start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPTWe now denote ????(n)=max|P|=n????(P)????????????????subscript????????????????????\mathcal{C}(n)=max_{|P|=n}\mathcal{C}(P)caligraphic_C ( italic_n ) = italic_m italic_a italic_x start_POSTSUBSCRIPT | italic_P | = italic_n end_POSTSUBSCRIPT caligraphic_C ( italic_P ) as the maximum over NAP of size n????nitalic_n. Note that given PS≼Pprecedes-or-equalssuperscript????????????P^{S}\preccurlyeq Pitalic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ≼ italic_P, |P|=s+N????????????|P|=s+N| italic_P | = italic_s + italic_N where N????Nitalic_N is a binomial random variable with ????(N)=θ(n−s)????????????????????\mathbb{E}(N)=\theta(n-s)blackboard_E ( italic_N ) = italic_θ ( italic_n - italic_s ).
Using the bound ????(n)≤(1−θn)????(n−1)+θn????(n)θ−s????????1superscript????????????????1superscript????????????????superscript????????\mathcal{C}(n)\leq(1-\theta^{n})\mathcal{C}(n-1)+\theta^{n}\mathcal{C}(n)% \theta^{-s}caligraphic_C ( italic_n ) ≤ ( 1 - italic_θ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) caligraphic_C ( italic_n - 1 ) + italic_θ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT caligraphic_C ( italic_n ) italic_θ start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPT, we can observe that ????(n)≤θ−s1−θn⋅n????????⋅superscript????????1superscript????????????\mathcal{C}(n)\leq\frac{\theta^{-s}}{1-\theta^{n}}\cdot ncaligraphic_C ( italic_n ) ≤ divide start_ARG italic_θ start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_θ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ⋅ italic_n. In addition, when n????nitalic_n is large enough, ????(n)????????\mathcal{C}(n)caligraphic_C ( italic_n ) is concave, then by use Jensen’s inequality:
(29) ????(n)≤????(????[s+N])+θ−s=????(s+θ(n−s))+θ−s????????????????delimited-[]????????superscript????????????????????????????superscript????????\displaystyle\mathcal{C}(n)\leq\mathcal{C}(\mathbb{E}[s+N])+\theta^{-s}=% \mathcal{C}(s+\theta(n-s))+\theta^{-s}caligraphic_C ( italic_n ) ≤ caligraphic_C ( blackboard_E [ italic_s + italic_N ] ) + italic_θ start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPT = caligraphic_C ( italic_s + italic_θ ( italic_n - italic_s ) ) + italic_θ start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPTTo solve the recurrence, this gives us:
(30) ????(n)≤θ−slognlogθ−1+s+1????????superscript????????????????????????????????????superscript????1????1\displaystyle\mathcal{C}(n)\leq\frac{\theta^{-s}logn}{log\theta^{-1}}+s+1caligraphic_C ( italic_n ) ≤ divide start_ARG italic_θ start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPT italic_l italic_o italic_g italic_n end_ARG start_ARG italic_l italic_o italic_g italic_θ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG + italic_s + 1The equation above illustrates a tradeoff between reducing the number of iterations (by increasing logθ−1superscript????1\log\theta^{-1}roman_log italic_θ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) and reducing the number of samples (by decreasing θ−ssuperscript????????\theta^{-s}italic_θ start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPT). To minimize ????(n)????????\mathcal{C}(n)caligraphic_C ( italic_n ), we need to set θ????\thetaitalic_θ so that the gradient of ????(n)????????\mathcal{C}(n)caligraphic_C ( italic_n ) w.r.t θ−1superscript????1\theta^{-1}italic_θ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is 00. This gives us: sxs−1logx−xs−1log2x=0????superscript????????1????superscript????????1superscript2????0\frac{sx^{s-1}}{\log x}-\frac{x^{s-1}}{\log^{2}x}=0divide start_ARG italic_s italic_x start_POSTSUPERSCRIPT italic_s - 1 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_x end_ARG - divide start_ARG italic_x start_POSTSUPERSCRIPT italic_s - 1 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_x end_ARG = 0, solving this gives us θ=e−1s????superscript????1????\theta=e^{-\frac{1}{s}}italic_θ = italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_s end_ARG end_POSTSUPERSCRIPT. Consequently, the upper bound becomes ????(n)=eslogn+s+1=O(slogn)????????????????????????1????????????\mathcal{C}(n)=es\log n+s+1=O(s\log n)caligraphic_C ( italic_n ) = italic_e italic_s roman_log italic_n + italic_s + 1 = italic_O ( italic_s roman_log italic_n ). ∎
For more information, please visit Zhaoyue Screens.