Appendix

000 055
001 056
002 Appendix to “A coherent interpretation of AUC 057
003 058
004
as a measure of aggregated classification performance” 059
005 060
006 061
007 062
008 063
009 064
010 As we argue in section 4.1, considering the n examples plus denote with T U (i) the threshold choice method for q being 065
011 an extra default classifier in both directions is equivalent to the discrete uniform distribution. 066
012 considering the threshold placed between examples 067
If we, as usual, average that along a uniform distribution of
013 So we can just define a discrete distribution on thresholds skews, we get the following definition: 068
014 between examples as follows: q = hq0 , q1 , ..., qn i with n be- Z 1 069
U (i)
015 ing the size of the dataset. Since it is a discrete distribution LU(z) , Qz (T U (i) (z); z)U(z)dz (4) 070
016 over examples, we have ∑ni=0 qi = 1. 0 071
017 And now we can obtain the following result: 072
018 Note that for the moment we assume no ties in the scores. 073
019 Theorem 1. (Theorem 5 in the paper) 074
One way to model this instance-uniform threshold choice
020 method is as a non-deterministic function: 1 n 075
U (i)
021 LU(z) = ∑ (1 − MAcc(t i )) 076
(n + 1) i=0
022 T i (c) , T i (z) , {ti with prob. qi } (1) 077
023 078
024 And now we can plug T i Eq. (1) into the general formula Proof. By using Eq. (3) we have:
079
025 of the expected loss for a range of skews Eq. (??): Z 1 080
U (i)
026 Z 1 LU(z) = Qz (T U (i) (z); z)U(z)dz 081
0
027 Lzi (t) , Qz (T i (z); z)wz (z)dz (2) n Z 1
082
0
028 ( ) = ∑ U (i) Q (t
z i ; z)U(z)dz 083
Z 1 n 0
029 =
i=0 084
030 0
∑ qi Qz (ti ; z) wz (z)dz n
1
Z 1
085
i=0 = ∑ Qz (ti ; z)U(z)dz
031 n Z 1

i=0 n + 1 0 086
032 = ∑ qi Qz (ti ; z)wz (z)dz (3) 087
1 n 1
Z
0
033 i=0 = ∑ 0 {z(1 − F (t
0 i )) + (1 − z)F (t
1 i )}U(z)dz 088
034 n + 1 i=0 089
We can see that we have two aggregations, one over the
035 possible skews and another one over the instances. We use 090
Since ti is constant for each integral we have that F0 (ti ) and
036 the discrete distribution (or weight vector) q for instances 091
F1 (ti ) are also constant, and using the uniform distribution
037 and the continuous distribution w for skews. Note that we 092
we have:
038 evaluate as assuming these two distributions are indepen- 093
n Z 1 Z 1
039 U (i) 1 094
dent. In order to introduce a dependence from cost propor- LU(z) = ∑ (1 − F0 (ti )) 0 zdz + F1 (ti ) 0 (1 − z)dz 095
040 tions (or skews) to operating points, we would need to use a (n + 1) i=0
041 weight function (of z) (a parametrised discrete distribution) n 096
1
042 i
instead of q. In fact, T is a very particular case of this. = ∑ {(1 − F (t
0 i ))(1/2) + F (t
1 i )(1/2)} 097
(5)
043 (n + 1) i=0 098
And now we have to consider a reasonable distribution for n
044 1 099
q. One easy choice is to set q as the discrete uniform distri- = ∑ (1 − MAcc(ti )) (6)
045 (n + 1) i=0 100
046 bution, i.e., q = h1/(n + 1), 1/(n + 1), ..., 1/(n + 1)i. 101
047 Of course there are other possibilities for q. For instance, 102
048 we could give higher probabilities to centre points (with a 103
049 But we can go further than that. 104
Beta or a binomial distribution). All these options are plau-
050 sible. Nonetheless, we give a justification below on why Lemma 2. Assuming no ties: 105
051 the uniform distribution for examples is a very good choice 106
n
052 i n0 + 1 107
in terms of the simplicity in which Lz can be computed, ∑ F0 (ti ) = 2 + n1 AUC
053 and, more importantly, in terms of its interpretation. We i=0 108
054 109
Title Suppressed Due to Excessive Size
110 Proof. The proof is symmetrical to lemma 2. 165

111 n n n 166
And now we can put everything together:
112 ∑ F 0 (ti ) = F 0 (t0 ) + ∑ F 0 (t i ) = ∑ F 0 (ti ) 167
113 i=0 i=1 i=1 Theorem 4. (Theorem 4 in the paper) 168
114 n
169
n1
115 = ∑ F0 (ti ) − n1 F0 (ti ) f1 (ti ) + n1 F0 (ti ) f1 (ti ) U (i) n 1 − AUC n+2 1 170
i=1 n 1 LU(z)
= + (7)
116 n+1 2 n+1 4 171
n
117 1 172
= ∑ n1 F0 (ti ) − f1 (ti ) + F0 (ti ) f1 (ti )
118 i=1 n1 Proof. Plugging lemmata 2 and 3 into the definition of 173
U (i)
119 LU(z) Eq. (5) we have: 174
If we assume no ties, for each example i only either f0 (ti ) =
120 1/n0 (and f1 (ti ) = 0) or f1 (ti ) = 1/n1 (and f0 (ti ) = 0). In 175
121 this situation, the first part represents the n0 cases where 176
n
122 U (i) 1 177
f1 is 0, i.e., where f0 (t j ) = 1/n0 , since the other n1 cases LU(z) = ∑ {(1 − F0 (ti ))(1/2) + F1 (ti )(1/2)}
123 (those for which f1 = 1/n1 ) are cancelled. So, (n + 1) i=0
178
124 n n
! 179
125 n n0 n
1 1 180
F (t ) = n F (t ) + n F (t ) f (t ) = 2(n + 1) − ∑ F0 (ti ) − ∑ (1 − F1 (ti ))
126 ∑ 0 i ∑ 1 0 j n1 ∑ 10 i 1 i 2(n + 1) i=0 i=0 181
i=1 j=1 i=1
127 1

n0 + 1 n1 + 1
182
128 But, precisely, it is the n1 cases for which f0 (t j ) = 1/n0 = 1− + n1 AUC + + n0 AUC 183
2(n + 1) 2 2
129 where F0 is incremented (ranging from 1/n0 to 1). So, 184
130 1 n+2 185
n n0
j

1
n = 1− + (n)AUC
131 2(n + 1) 2 186
∑ F0 (ti ) = ∑ n1 n0 n1 + ∑ n1 F0 (ti ) f1 (ti ) 3 n AUC 1
132 i=1 j=1 i=1
= − − 187
133 1 n0 n 4 n + 1 2 4(n + 1) 188
134 = ∑ j + n1 ∑ F0 (ti ) f1 (ti ) 189
n0 j=1 i=1
135 190
n
136 n0 + 1 191
= + n1 ∑ F0 (ti ) f1 (ti ) The issue with ties can be addressed in the same way we
137 2 i=1
192
138 mentioned above. 193
139 The term on the right is just a discrete version (with sums Corollary 5. When the number of examples goes to infinity 194
140 instead of integrals) of Eq. (??), giving (and assuming no ties), we have: 195
141 n
196
142 n0 + 1 U (i) 3 AUC 197
∑ F0 (ti ) = 2 + n1 AUC LU(z) = − (8)
143 i=1 4 2 198
144 199
145 This gives the same result as Corollary 3 in the paper. 200
146 201
This generalises with ties by considering all the possible
147 202
orderings of the examples which tie and averaging all of
148 203
them. Since this can have, in practice, a high cost, a better
149 204
option is just to see that since we’re putting the thresholds
150 205
between examples, this turns out to be equivalent to consid-
151 206
ering less segments but using proportionally higher weights
152 207
for them. For instance, given one example, the segment on
153 208
the left and the segment on the right have weight 1/(n + 1).
154 209
Given two examples with a tie, the segment on the left has
155 1 1 1 1 210
156 n+1 + 2(n+1) and the segment on the right gets n+1 + 2(n+1) . 211
157 In other words, any segment which is removed because a tie 212
158 reappears (half on the left and half on the right). 213
159 And now we can give a similar result for F1 : 214
160 Lemma 3. Assuming no ties: 215
161 216
n
162 n1 + 1 217
163 ∑ (1 − F1 (ti )) = 2 + n0 AUC 218
i=0
164 219

Appendix

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Appendix

Încărcat de

Drepturi de autor:

Formate disponibile

000 055

110 Proof. The proof is symmetrical to lemma 2. 165

S-ar putea să vă placă și