Documente Academic
Documente Profesional
Documente Cultură
Image Recognition
Raining He, Xiangyu Zhang, Shaoqin Ren, Jian Sun
Microsoft Research
So why not just train more layers (if youve got the
hardware?)
This is not overfitting, since they also do worse on the training data.
20
10
56-layer
56-layer
20-layer
10
20-layer
0
iter. (1e4)
20
0
0
iter. (1e4)
But in practice, these identity mapping (or better) are not learned.
e.g. for identity mapping, learning F(x) = 0 is easier than learning H(x) = 1.
x
shortcut connections
weight layer
F(x)
relu
weight layer
identity
F(x)+x
relu
output
size: 224
VGG-19
34-layer plain
34-layer residual
image
image
image
pool, /2
pool, /2
pool, /2
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
pool, /2
pool, /2
3x3 conv, 64
3x3 conv, 64
pool, /2
output
size: 112
3x3 conv, 128
output
size: 56
output
size: 28
output
size: 14
Related Ideas
LSTM
LSTMs unrolled in time also has the same information flow (with gates).
60
60
50
50
error (%)
error (%)
Results (ImageNet)
40
40
34-layer
18-layer
30
20
0
30
18-layer
plain-18
plain-34
10
20
30
iter. (1e4)
40
50
20
0
ResNet-18
ResNet-34
10
34-layer
20
30
iter. (1e4)
40
50
Shortcut Parameters
A. zero-padding for increasing dimensions
B. projection shortcuts for increasing dimensions
y F px, tWi uq ` Ws x
20
20-layer
0
0
10
20-layer
110-layer
plain-20
plain-32
plain-44
plain-56
1
error (%)
error (%)
56-layer
10
iter. (1e4)
0
0
20
ResNet-20
ResNet-32
ResNet-44
ResNet-56
ResNet-110
iter. (1e4)
error (%)
Plain
20
residual-110
residual-1202
10
1
0
iter. (1e4)
Layer Responses
plain-20
plain-56
ResNet-20
ResNet-56
ResNet-110
std
3
2
1
0
20
40
60
80
100
std
3
2
1
0
20
40
60
80
100
Conclusion