Backpropagation in Convolutional Neural Networks

ECSE 6965
Programing Assignment 3
Sergei Bugrov
Backpropagation in convolutional neural networks
1. 𝛁ŷ = ŷ – y, when loos function is cross entropy
𝜕ŷ 𝜕ŷ 𝜕ŷ
2. 𝛁𝑊𝑜 = 𝛁ŷ, 𝛁𝑏𝑜 = 𝛁ŷ, 𝛁FC = 𝛁ŷ
𝜕𝑊𝑜 𝜕𝑏𝑜 𝜕FC
3. 𝛁P[𝑟][𝑐] = 𝛁FC[(𝑟 − 1)𝑁𝑟𝐹𝐶 ]
𝜕𝑃 𝑁𝑃 𝜕𝑃[𝑟] 𝑁𝑃 𝑁𝑃 𝜕𝑃[𝑟][𝑐]
4. 𝛁A = 𝛁P = ∑𝑟=1
𝑟
𝛁P[𝑟] = ∑𝑟=1
𝑟
∑𝑐=1
𝑐
𝛁P[𝑟][𝑐] =
𝜕𝐴 𝜕𝐴 𝜕𝐴
𝜕𝑃[𝑟][𝑐] 𝜕𝑃[𝑟][𝑐]
⋯
𝜕𝐴[1][1] 𝜕𝐴[1][𝑁𝑐𝐴 ]
𝑃 𝑃 𝜕𝑃[𝑟][𝑐] 𝟏 𝑖𝑓 𝑘 = 𝑖 ∗ 𝑎𝑛𝑑 𝑙 = 𝑗 ∗
∑𝑁 𝑁𝑐
𝑟=1 ∑𝑐=1 𝛁P[𝑟][𝑐], 𝑤ℎ𝑒𝑟𝑒 =
𝑟
⋮ ⋱ ⋮ { ,
𝜕𝑃[𝑟][𝑐] 𝜕𝑃[𝑟][𝑐]
𝜕𝐴[𝑘][𝑙] 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
⋯
[𝜕𝐴[𝑁𝑟𝐴][1] 𝜕𝐴[𝑁𝑟𝐴 ][𝑁𝑐𝐴 ]]
where 𝑖 ∗ , 𝑗 ∗ = argmax 𝐴[𝑘][𝑙] and if stride = 1

𝑟≤𝑘≤𝑟+𝑑−1
𝑐≤𝑙≤𝑐+𝑑−1
𝜕𝐴[𝑟][𝑐] 𝜕𝐴[𝑟][𝑐]
⋯
𝜕𝐶[1][1] 𝜕𝐶[1][𝑁𝑐𝐴 ]
𝜕𝐴 𝑁𝑟𝐴 𝑁𝑐𝐴 𝜕𝐴[𝑟][𝑐] 𝑁𝑟𝐴 𝑁𝑐𝐴
5. 𝛁C = 𝛁A = ∑𝑟=1 ∑𝑐=1 𝛁A[𝑟][𝑐] = ∑𝑟=1 ∑𝑐=1 ⋮ ⋱ ⋮ 𝛁A[𝑟][𝑐]
𝜕𝐶 𝜕𝐶
𝜕𝐴[𝑟][𝑐] 𝜕𝐴[𝑟][𝑐]
⋯
[𝜕𝐶[𝑁𝑟𝐴][1] 𝜕𝐶[𝑁𝑟𝐴 ][𝑁𝑐𝐴 ]]
𝜕𝐴[𝑟][𝑐] 𝟏 𝑖𝑓 𝑖 = 𝑟, 𝑗 = 𝑐, 𝑎𝑛𝑑 𝐶[𝑖][𝑗] > 0

={
𝜕𝐶[𝑖][𝑗] 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
⋯
𝜕𝑊𝑥 [1][1] 𝜕𝑊𝑥[1][𝑁𝑐𝐶 ]
𝜕𝐶 𝑁𝐶 𝑁 𝐶 𝜕𝐶[𝑟][𝑐] 𝑁𝐶 𝑁𝐶
6. 𝛁𝑊𝑥 = 𝛁C = ∑𝑟=1
𝑟
∑𝑐=1
𝑐
𝛁C[𝑟][𝑐] = ∑𝑟=1
𝑟
∑𝑐=1
𝑐
⋮ ⋱ ⋮ 𝛁C[𝑟][𝑐]
𝜕𝑊𝑥 𝜕𝑊𝑥
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
⋯
[𝜕𝑊𝑥[𝑁𝑟𝐶][1] 𝜕𝑊𝑥 [𝑁𝑟𝐴 ][𝑁𝑐𝑐 ]]
𝜕𝐶[𝑟][𝑐]
𝜕𝑊𝑥[𝑖][𝑗][1]
𝜕𝐶[𝑟][𝑐]
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
= 𝜕𝑊𝑥[𝑖][𝑗][2] , 𝜕𝑊 [𝑖][𝑗][𝑙] = X[𝑟 + 𝑖 − 1][𝑐 + 𝑗 − 1][𝑙]
𝜕𝑊𝑥[𝑖][𝑗] 𝑥
⋮
𝜕𝐶[𝑟][𝑐]
[𝜕𝑊𝑥[𝑖][𝑗][𝐷]]
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
𝑁𝑟𝐶 𝑁𝑐𝐶 ⋯ 𝑁𝑟𝐶 𝑁𝑐𝐶
𝜕𝐶 𝜕𝐶[𝑟][𝑐] 𝜕𝑏𝑥 [1][1] 𝜕𝑏𝑥 [1][𝐾]
𝛁𝑏𝑥 = 𝛁C = ∑ ∑ 𝛁C[𝑟][𝑐] = ∑ ∑ ⋮ ⋱ ⋮ 𝛁C[𝑟][𝑐]
𝜕𝑏𝑥 𝜕𝑏𝑥
𝑟=1 𝑐=1 𝑟=1 𝑐=1 𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
⋯
[𝜕𝑏𝑥 [𝐾][1] 𝜕𝑏𝑥 [𝐾][𝐾]]
𝜕𝐶[𝑟][𝑐] 𝟏 𝑖𝑓 𝑖 = 𝑟, 𝑗 = 𝑐
where ={
𝜕𝑏𝑥 [𝑖][𝑗] 𝟎, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
⋯
𝜕𝑋[1][1] 𝜕𝑋[1][𝑁𝑐𝑋 ]
𝜕𝐶 𝑁𝑟𝐶 𝑁𝑐𝐶 𝜕𝐶[𝑟][𝑐] 𝑁𝑟𝐶 𝑁𝑐𝐶
7. 𝛁X = 𝛁C = ∑𝑟=1 ∑𝑐=1 𝛁C[𝑟][𝑐] = ∑𝑟=1 ∑𝑐=1 ⋮ ⋱ ⋮ 𝛁C[𝑟][𝑐], where
𝜕X 𝜕X
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
⋯
[𝜕𝑋[𝑁𝑟𝑋 ][1] 𝜕X[𝑁𝑟𝑋 ][𝑁𝑐𝑋 ]]
𝜕𝐶[𝑟][𝑐]
𝜕𝑋[𝑖][𝑗][1]
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
= 𝜕𝑋[𝑖][𝑗][2]
𝜕𝑋[𝑖][𝑗]
⋮
𝜕𝐶[𝑟][𝑐]
[𝜕X[𝑖][𝑗][𝐷]]
𝜕𝐶[𝑟][𝑐] 𝑊𝑥 [𝑖 − 𝑟 + 1][𝑗 − 𝑐 + 1][𝑙] 𝑖𝑓 𝑟 ≤ 𝑖 ≤ 𝑟 + 𝐾 − 1 𝑎𝑛𝑑 𝑐 ≤ 𝑗 ≤ 𝑐 + 𝐾 − 1
where ={ when
𝜕X[𝑖][𝑗][𝑙] 𝟎, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
stride = 1
Model architecture.
• Input: tensor X ∈ R32x32x3;

• W 1, tensor of weights, its shape = [5, 5, 3, 32], vector of biases b1 ∈ R32;
• 1st Convolutions layer with 32 kernels of size 5x5, stride =1
• 1st Pooling layer with a kernel of size 2x2, stride =1;
• 2nd Convolutions layer with 32 kernels of size 5x5, stride =1;
• 1st Pooling layer with a kernel of size 2x2, stride =1;
• 3rd Convolutions layer with 64 kernels of size 3x3, stride =1;
• W 4, matrix of weights, its shape = [192, 65536], vector of biases b4 ∈ R192;
• FC, fully connected layer ∈ R192
• W 4, matrix of weights, its shape = [10, 192], vector of biases b4 ∈ R10;
• Output, output layer ∈ R10;
• Loss function – cross entropy error: loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=output, labels=y))
• Optimizer: tf.train.RMSPropOptimizer(learning_rate=1e-3).minimize(loss)
Hyperparameters.
Batch size = 250
Number of epochs = 200
Discussion. Stride > 1 and/or pooling kernel size > 2 will improve the speed of training while
performance will be the same. Obviously, a deeper model and/or Dropout layer will improve
generalization. It took me 2+ hours to train the model. I ran it for only 200 epochs vs 6000 epochs in the
instructions. Hence, relative underperformance.
1st Convolution Layer Filters:
Loss
8
6
4
2
0
3…
0
55
66
11
22
33
44
77
88
99
110
121
132
143
154
165
176
187
198
209
220
231
242
Train_Loss Val_Loss
Terrible overfitting. I would blame gigantic fully connected layer. Good dropout would improve the
situation.
Accuracy
120.00%
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
0…
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
Train_Accu Val_Accu
Test accuracy (per class)

Class Accuracy
0 0.6782786885245902
1 0.7465346534653465
2 0.51953125
3 0.386317907444668
4 0.5641025641025641
5 0.5594262295081968
6 0.7535641547861507
7 0.6565656565656566
8 0.7857142857142857
9 0.7309941520467836.
Class #3 is way below (only 0.386) average accuracy (Average train_accu: 99.50%, Average
valid_accu: 63.80%), as well as classes 3,4, and 5.

Backpropagation in Convolutional Neural Networks

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Backpropagation in Convolutional Neural Networks

Încărcat de

Drepturi de autor:

Formate disponibile

ECSE 6965

Backpropagation in convolutional neural networks

1. 𝛁ŷ = ŷ – y, when loos function is cross entropy

3. 𝛁P[𝑟][𝑐] = 𝛁FC[(𝑟 − 1)𝑁𝑟𝐹𝐶 ]

where 𝑖 ∗ , 𝑗 ∗ = argmax 𝐴[𝑘][𝑙] and if stride = 1

𝜕𝐴[𝑟][𝑐] 𝟏 𝑖𝑓 𝑖 = 𝑟, 𝑗 = 𝑐, 𝑎𝑛𝑑 𝐶[𝑖][𝑗] > 0

• Input: tensor X ∈ R32x32x3;

1st Convolution Layer Filters:

Test accuracy (per class)

valid_accu: 63.80%), as well as classes 3,4, and 5.

S-ar putea să vă placă și