Documente Academic
Documente Profesional
Documente Cultură
Programing Assignment 3
Sergei Bugrov
𝜕ŷ 𝜕ŷ 𝜕ŷ
2. 𝛁𝑊𝑜 = 𝛁ŷ, 𝛁𝑏𝑜 = 𝛁ŷ, 𝛁FC = 𝛁ŷ
𝜕𝑊𝑜 𝜕𝑏𝑜 𝜕FC
𝜕𝑃 𝑁𝑃 𝜕𝑃[𝑟] 𝑁𝑃 𝑁𝑃 𝜕𝑃[𝑟][𝑐]
4. 𝛁A = 𝛁P = ∑𝑟=1
𝑟
𝛁P[𝑟] = ∑𝑟=1
𝑟
∑𝑐=1
𝑐
𝛁P[𝑟][𝑐] =
𝜕𝐴 𝜕𝐴 𝜕𝐴
𝜕𝑃[𝑟][𝑐] 𝜕𝑃[𝑟][𝑐]
⋯
𝜕𝐴[1][1] 𝜕𝐴[1][𝑁𝑐𝐴 ]
𝑃 𝑃 𝜕𝑃[𝑟][𝑐] 𝟏 𝑖𝑓 𝑘 = 𝑖 ∗ 𝑎𝑛𝑑 𝑙 = 𝑗 ∗
∑𝑁 𝑁𝑐
𝑟=1 ∑𝑐=1 𝛁P[𝑟][𝑐], 𝑤ℎ𝑒𝑟𝑒 =
𝑟
⋮ ⋱ ⋮ { ,
𝜕𝑃[𝑟][𝑐] 𝜕𝑃[𝑟][𝑐]
𝜕𝐴[𝑘][𝑙] 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
⋯
[𝜕𝐴[𝑁𝑟𝐴][1] 𝜕𝐴[𝑁𝑟𝐴 ][𝑁𝑐𝐴 ]]
𝜕𝐴[𝑟][𝑐] 𝜕𝐴[𝑟][𝑐]
⋯
𝜕𝐶[1][1] 𝜕𝐶[1][𝑁𝑐𝐴 ]
𝜕𝐴 𝑁𝑟𝐴 𝑁𝑐𝐴 𝜕𝐴[𝑟][𝑐] 𝑁𝑟𝐴 𝑁𝑐𝐴
5. 𝛁C = 𝛁A = ∑𝑟=1 ∑𝑐=1 𝛁A[𝑟][𝑐] = ∑𝑟=1 ∑𝑐=1 ⋮ ⋱ ⋮ 𝛁A[𝑟][𝑐]
𝜕𝐶 𝜕𝐶
𝜕𝐴[𝑟][𝑐] 𝜕𝐴[𝑟][𝑐]
⋯
[𝜕𝐶[𝑁𝑟𝐴][1] 𝜕𝐶[𝑁𝑟𝐴 ][𝑁𝑐𝐴 ]]
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
⋯
𝜕𝑊𝑥 [1][1] 𝜕𝑊𝑥[1][𝑁𝑐𝐶 ]
𝜕𝐶 𝑁𝐶 𝑁 𝐶 𝜕𝐶[𝑟][𝑐] 𝑁𝐶 𝑁𝐶
6. 𝛁𝑊𝑥 = 𝛁C = ∑𝑟=1
𝑟
∑𝑐=1
𝑐
𝛁C[𝑟][𝑐] = ∑𝑟=1
𝑟
∑𝑐=1
𝑐
⋮ ⋱ ⋮ 𝛁C[𝑟][𝑐]
𝜕𝑊𝑥 𝜕𝑊𝑥
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
⋯
[𝜕𝑊𝑥[𝑁𝑟𝐶][1] 𝜕𝑊𝑥 [𝑁𝑟𝐴 ][𝑁𝑐𝑐 ]]
𝜕𝐶[𝑟][𝑐]
𝜕𝑊𝑥[𝑖][𝑗][1]
𝜕𝐶[𝑟][𝑐]
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
= 𝜕𝑊𝑥[𝑖][𝑗][2] , 𝜕𝑊 [𝑖][𝑗][𝑙] = X[𝑟 + 𝑖 − 1][𝑐 + 𝑗 − 1][𝑙]
𝜕𝑊𝑥[𝑖][𝑗] 𝑥
⋮
𝜕𝐶[𝑟][𝑐]
[𝜕𝑊𝑥[𝑖][𝑗][𝐷]]
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
𝑁𝑟𝐶 𝑁𝑐𝐶 ⋯ 𝑁𝑟𝐶 𝑁𝑐𝐶
𝜕𝐶 𝜕𝐶[𝑟][𝑐] 𝜕𝑏𝑥 [1][1] 𝜕𝑏𝑥 [1][𝐾]
𝛁𝑏𝑥 = 𝛁C = ∑ ∑ 𝛁C[𝑟][𝑐] = ∑ ∑ ⋮ ⋱ ⋮ 𝛁C[𝑟][𝑐]
𝜕𝑏𝑥 𝜕𝑏𝑥
𝑟=1 𝑐=1 𝑟=1 𝑐=1 𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
⋯
[𝜕𝑏𝑥 [𝐾][1] 𝜕𝑏𝑥 [𝐾][𝐾]]
𝜕𝐶[𝑟][𝑐] 𝟏 𝑖𝑓 𝑖 = 𝑟, 𝑗 = 𝑐
where ={
𝜕𝑏𝑥 [𝑖][𝑗] 𝟎, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
⋯
𝜕𝑋[1][1] 𝜕𝑋[1][𝑁𝑐𝑋 ]
𝜕𝐶 𝑁𝑟𝐶 𝑁𝑐𝐶 𝜕𝐶[𝑟][𝑐] 𝑁𝑟𝐶 𝑁𝑐𝐶
7. 𝛁X = 𝛁C = ∑𝑟=1 ∑𝑐=1 𝛁C[𝑟][𝑐] = ∑𝑟=1 ∑𝑐=1 ⋮ ⋱ ⋮ 𝛁C[𝑟][𝑐], where
𝜕X 𝜕X
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
⋯
[𝜕𝑋[𝑁𝑟𝑋 ][1] 𝜕X[𝑁𝑟𝑋 ][𝑁𝑐𝑋 ]]
𝜕𝐶[𝑟][𝑐]
𝜕𝑋[𝑖][𝑗][1]
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
= 𝜕𝑋[𝑖][𝑗][2]
𝜕𝑋[𝑖][𝑗]
⋮
𝜕𝐶[𝑟][𝑐]
[𝜕X[𝑖][𝑗][𝐷]]
𝜕𝐶[𝑟][𝑐] 𝑊𝑥 [𝑖 − 𝑟 + 1][𝑗 − 𝑐 + 1][𝑙] 𝑖𝑓 𝑟 ≤ 𝑖 ≤ 𝑟 + 𝐾 − 1 𝑎𝑛𝑑 𝑐 ≤ 𝑗 ≤ 𝑐 + 𝐾 − 1
where ={ when
𝜕X[𝑖][𝑗][𝑙] 𝟎, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
stride = 1
Model architecture.
Hyperparameters.
Batch size = 250
Number of epochs = 200
Discussion. Stride > 1 and/or pooling kernel size > 2 will improve the speed of training while
performance will be the same. Obviously, a deeper model and/or Dropout layer will improve
generalization. It took me 2+ hours to train the model. I ran it for only 200 epochs vs 6000 epochs in the
instructions. Hence, relative underperformance.
Loss
8
6
4
2
0
3…
0
55
66
11
22
33
44
77
88
99
110
121
132
143
154
165
176
187
198
209
220
231
242
Train_Loss Val_Loss
Terrible overfitting. I would blame gigantic fully connected layer. Good dropout would improve the
situation.
Accuracy
120.00%
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
0…
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
Train_Accu Val_Accu
Class #3 is way below (only 0.386) average accuracy (Average train_accu: 99.50%, Average