Sunteți pe pagina 1din 4

ECSE 6965

Programing Assignment 3
Sergei Bugrov

Backpropagation in convolutional neural networks

1. 𝛁ŷ = ŷ – y, when loos function is cross entropy

𝜕ŷ 𝜕ŷ 𝜕ŷ
2. 𝛁𝑊𝑜 = 𝛁ŷ, 𝛁𝑏𝑜 = 𝛁ŷ, 𝛁FC = 𝛁ŷ
𝜕𝑊𝑜 𝜕𝑏𝑜 𝜕FC

3. 𝛁P[𝑟][𝑐] = 𝛁FC[(𝑟 − 1)𝑁𝑟𝐹𝐶 ]

𝜕𝑃 𝑁𝑃 𝜕𝑃[𝑟] 𝑁𝑃 𝑁𝑃 𝜕𝑃[𝑟][𝑐]
4. 𝛁A = 𝛁P = ∑𝑟=1
𝑟
𝛁P[𝑟] = ∑𝑟=1
𝑟
∑𝑐=1
𝑐
𝛁P[𝑟][𝑐] =
𝜕𝐴 𝜕𝐴 𝜕𝐴
𝜕𝑃[𝑟][𝑐] 𝜕𝑃[𝑟][𝑐]

𝜕𝐴[1][1] 𝜕𝐴[1][𝑁𝑐𝐴 ]
𝑃 𝑃 𝜕𝑃[𝑟][𝑐] 𝟏 𝑖𝑓 𝑘 = 𝑖 ∗ 𝑎𝑛𝑑 𝑙 = 𝑗 ∗
∑𝑁 𝑁𝑐
𝑟=1 ∑𝑐=1 𝛁P[𝑟][𝑐], 𝑤ℎ𝑒𝑟𝑒 =
𝑟
⋮ ⋱ ⋮ { ,
𝜕𝑃[𝑟][𝑐] 𝜕𝑃[𝑟][𝑐]
𝜕𝐴[𝑘][𝑙] 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

[𝜕𝐴[𝑁𝑟𝐴][1] 𝜕𝐴[𝑁𝑟𝐴 ][𝑁𝑐𝐴 ]]

where 𝑖 ∗ , 𝑗 ∗ = argmax 𝐴[𝑘][𝑙] and if stride = 1


𝑟≤𝑘≤𝑟+𝑑−1
𝑐≤𝑙≤𝑐+𝑑−1

𝜕𝐴[𝑟][𝑐] 𝜕𝐴[𝑟][𝑐]

𝜕𝐶[1][1] 𝜕𝐶[1][𝑁𝑐𝐴 ]
𝜕𝐴 𝑁𝑟𝐴 𝑁𝑐𝐴 𝜕𝐴[𝑟][𝑐] 𝑁𝑟𝐴 𝑁𝑐𝐴
5. 𝛁C = 𝛁A = ∑𝑟=1 ∑𝑐=1 𝛁A[𝑟][𝑐] = ∑𝑟=1 ∑𝑐=1 ⋮ ⋱ ⋮ 𝛁A[𝑟][𝑐]
𝜕𝐶 𝜕𝐶
𝜕𝐴[𝑟][𝑐] 𝜕𝐴[𝑟][𝑐]

[𝜕𝐶[𝑁𝑟𝐴][1] 𝜕𝐶[𝑁𝑟𝐴 ][𝑁𝑐𝐴 ]]

𝜕𝐴[𝑟][𝑐] 𝟏 𝑖𝑓 𝑖 = 𝑟, 𝑗 = 𝑐, 𝑎𝑛𝑑 𝐶[𝑖][𝑗] > 0


={
𝜕𝐶[𝑖][𝑗] 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]

𝜕𝑊𝑥 [1][1] 𝜕𝑊𝑥[1][𝑁𝑐𝐶 ]
𝜕𝐶 𝑁𝐶 𝑁 𝐶 𝜕𝐶[𝑟][𝑐] 𝑁𝐶 𝑁𝐶
6. 𝛁𝑊𝑥 = 𝛁C = ∑𝑟=1
𝑟
∑𝑐=1
𝑐
𝛁C[𝑟][𝑐] = ∑𝑟=1
𝑟
∑𝑐=1
𝑐
⋮ ⋱ ⋮ 𝛁C[𝑟][𝑐]
𝜕𝑊𝑥 𝜕𝑊𝑥
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]

[𝜕𝑊𝑥[𝑁𝑟𝐶][1] 𝜕𝑊𝑥 [𝑁𝑟𝐴 ][𝑁𝑐𝑐 ]]
𝜕𝐶[𝑟][𝑐]
𝜕𝑊𝑥[𝑖][𝑗][1]
𝜕𝐶[𝑟][𝑐]
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
= 𝜕𝑊𝑥[𝑖][𝑗][2] , 𝜕𝑊 [𝑖][𝑗][𝑙] = X[𝑟 + 𝑖 − 1][𝑐 + 𝑗 − 1][𝑙]
𝜕𝑊𝑥[𝑖][𝑗] 𝑥

𝜕𝐶[𝑟][𝑐]
[𝜕𝑊𝑥[𝑖][𝑗][𝐷]]

𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
𝑁𝑟𝐶 𝑁𝑐𝐶 ⋯ 𝑁𝑟𝐶 𝑁𝑐𝐶
𝜕𝐶 𝜕𝐶[𝑟][𝑐] 𝜕𝑏𝑥 [1][1] 𝜕𝑏𝑥 [1][𝐾]
𝛁𝑏𝑥 = 𝛁C = ∑ ∑ 𝛁C[𝑟][𝑐] = ∑ ∑ ⋮ ⋱ ⋮ 𝛁C[𝑟][𝑐]
𝜕𝑏𝑥 𝜕𝑏𝑥
𝑟=1 𝑐=1 𝑟=1 𝑐=1 𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]

[𝜕𝑏𝑥 [𝐾][1] 𝜕𝑏𝑥 [𝐾][𝐾]]
𝜕𝐶[𝑟][𝑐] 𝟏 𝑖𝑓 𝑖 = 𝑟, 𝑗 = 𝑐
where ={
𝜕𝑏𝑥 [𝑖][𝑗] 𝟎, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]

𝜕𝑋[1][1] 𝜕𝑋[1][𝑁𝑐𝑋 ]
𝜕𝐶 𝑁𝑟𝐶 𝑁𝑐𝐶 𝜕𝐶[𝑟][𝑐] 𝑁𝑟𝐶 𝑁𝑐𝐶
7. 𝛁X = 𝛁C = ∑𝑟=1 ∑𝑐=1 𝛁C[𝑟][𝑐] = ∑𝑟=1 ∑𝑐=1 ⋮ ⋱ ⋮ 𝛁C[𝑟][𝑐], where
𝜕X 𝜕X
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]

[𝜕𝑋[𝑁𝑟𝑋 ][1] 𝜕X[𝑁𝑟𝑋 ][𝑁𝑐𝑋 ]]

𝜕𝐶[𝑟][𝑐]
𝜕𝑋[𝑖][𝑗][1]
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
= 𝜕𝑋[𝑖][𝑗][2]
𝜕𝑋[𝑖][𝑗]

𝜕𝐶[𝑟][𝑐]
[𝜕X[𝑖][𝑗][𝐷]]
𝜕𝐶[𝑟][𝑐] 𝑊𝑥 [𝑖 − 𝑟 + 1][𝑗 − 𝑐 + 1][𝑙] 𝑖𝑓 𝑟 ≤ 𝑖 ≤ 𝑟 + 𝐾 − 1 𝑎𝑛𝑑 𝑐 ≤ 𝑗 ≤ 𝑐 + 𝐾 − 1
where ={ when
𝜕X[𝑖][𝑗][𝑙] 𝟎, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
stride = 1

Model architecture.

• Input: tensor X ∈ R32x32x3;


• W 1, tensor of weights, its shape = [5, 5, 3, 32], vector of biases b1 ∈ R32;
• 1st Convolutions layer with 32 kernels of size 5x5, stride =1
• 1st Pooling layer with a kernel of size 2x2, stride =1;
• W 2, tensor of weights, its shape = [5, 5, 32, 32], vector of biases b2 ∈ R32;
• 2nd Convolutions layer with 32 kernels of size 5x5, stride =1;
• 1st Pooling layer with a kernel of size 2x2, stride =1;
• W 3, tensor of weights, its shape = [3, 3, 32, 64], vector of biases b3 ∈ R64;
• 3rd Convolutions layer with 64 kernels of size 3x3, stride =1;
• W 4, matrix of weights, its shape = [192, 65536], vector of biases b4 ∈ R192;
• FC, fully connected layer ∈ R192
• W 4, matrix of weights, its shape = [10, 192], vector of biases b4 ∈ R10;
• Output, output layer ∈ R10;
• Loss function – cross entropy error: loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=output, labels=y))
• Optimizer: tf.train.RMSPropOptimizer(learning_rate=1e-3).minimize(loss)

Hyperparameters.
Batch size = 250
Number of epochs = 200

Discussion. Stride > 1 and/or pooling kernel size > 2 will improve the speed of training while
performance will be the same. Obviously, a deeper model and/or Dropout layer will improve
generalization. It took me 2+ hours to train the model. I ran it for only 200 epochs vs 6000 epochs in the
instructions. Hence, relative underperformance.

1st Convolution Layer Filters:

Loss
8
6
4
2
0
3…
0

55
66
11
22
33
44

77
88
99
110
121
132
143
154
165
176
187
198
209
220
231
242

Train_Loss Val_Loss

Terrible overfitting. I would blame gigantic fully connected layer. Good dropout would improve the
situation.
Accuracy
120.00%

100.00%

80.00%

60.00%

40.00%

20.00%

0.00%

0…
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
Train_Accu Val_Accu

Test accuracy (per class)


Class Accuracy
0 0.6782786885245902
1 0.7465346534653465
2 0.51953125
3 0.386317907444668
4 0.5641025641025641
5 0.5594262295081968
6 0.7535641547861507
7 0.6565656565656566
8 0.7857142857142857
9 0.7309941520467836.

Class #3 is way below (only 0.386) average accuracy (Average train_accu: 99.50%, Average

valid_accu: 63.80%), as well as classes 3,4, and 5.

S-ar putea să vă placă și