Documente Academic
Documente Profesional
Documente Cultură
Deep Learning
(CNNs)
Deep Learning Readings:
Murphy 28 Matt Gormley
Bishop -‐-‐
HTF -‐-‐
Lecture 21
Mitchell -‐-‐ April 05, 2017
1
Reminders
• Homework 5 (Part II): Peer Review
– Release: Wed, Mar. 29 Expectation: You
should spend at most 1
– Due: Wed, Apr. 05 at 11:59pm hour on your reviews
• Peer Tutoring
• Homework 7: Deep Learning
– Release: Wed, Apr. 05
– Watch for multiple due dates!!
2
BACKPROPAGATION
3
A Recipe for
Background
Machine Learning
1. Given training data: 3. Define goal:
4
Training Backpropagation
Whiteboard
– Example: Backpropagation for Calculus Quiz #1
5
Training Backpropagation
Automatic Differentiation – Reverse Mode (aka. Backpropagation)
Forward Computation
1. Write an algorithm for evaluating the function y = f(x). The
algorithm defines a directed acyclic graph, where each variable is a
node (i.e. the “computation graph”)
2. Visit each node in topological order.
For variable ui with inputs v1,…, vN
a. Compute ui = gi(v1,…, vN)
b. Store the result at the node
Backward Computation
1. Initialize all partial derivatives dy/duj to 0 and dy/dy = 1.
2. Visit each node in reverse topological order.
For variable ui = gi(v1,…, vN)
a. We already know dy/dui
b. Increment dy/dvj by (dy/dui)(dui/dvj)
(Choice of algorithm ensures computing (dui/dvj) is easy)
Forward Backward
dJ
J = cos(u) Y= sin(u)
du
dJ dJ du du dJ dJ du du
u = u1 + u2 Y= , =1 Y= , =1
du1 du du1 du1 du2 du du2 du2
dJ dJ du1 du1
u1 = sin(t) Y= , = +Qb(t)
dt du1 dt dt
dJ dJ du2 du2
u2 = 3t Y= , =3
dt du2 dt dt
dJ dJ dt dt
t = x2 Y= , = 2x
dx dt dx dx
7
Training Backpropagation
Simple Example: The goal is to compute J = +Qb(bBM(x2 ) + 3x2 )
on the forward pass and the derivative dJ
dx on the backward pass.
Forward Backward
dJ
J = cos(u) Y= sin(u)
du
dJ dJ du du dJ dJ du du
u = u1 + u2 Y= , =1 Y= , =1
du1 du du1 du1 du2 du du2 du2
dJ dJ du1 du1
u1 = sin(t) Y= , = +Qb(t)
dt du1 dt dt
dJ dJ du2 du2
u2 = 3t Y= , =3
dt du2 dt dt
dJ dJ dt dt
t = x2 Y= , = 2x
dx dt dx dx
8
Training Backpropagation
Output
Case 1:
Logistic θ1 θ2 θ3 θM
Regression
…
Input
Forward Backward
dJ y (1 y )
J = y HQ; y + (1 y ) HQ;(1 y) = +
dy y y 1
1 dJ dJ dy dy 2tT( a)
y= = , =
1 + 2tT( a) da dy da da (2tT( a) + 1)2
D
dJ dJ da da
a= j xj = , = xj
j=0
d j da d j d j
dJ dJ da da
= , = j
dxj da dxj dxj
9
Training Backpropagation
(F) Loss
J = 12 (y y (d) )2
(A) Input
Given xi , i 10
Training Backpropagation
(F) Loss
J = 12 (y y )2
(A) Input
Given xi , i 11
Training Backpropagation
Case 2: Forward Backward
Neural dJ y (1 y )
J = y HQ; y + (1 y ) HQ;(1 y) = +
Network dy y y 1
1 dJ dJ dy dy 2tT( b)
y= = , =
1 + 2tT( b) db dy db db (2tT( b) + 1)2
…
D
… dJ dJ db db
b= j zj = , = zj
j=0
d j db d j d j
dJ dJ db db
= , = j
dzj db dzj dzj
1 dJ dJ dzj dzj 2tT( aj )
zj = = , =
1 + 2tT( aj ) daj dzj daj daj (2tT( aj ) + 1)2
M
dJ dJ daj daj
aj = ji xi = , = xi
i=0
d ji daj d ji d ji
D
dJ dJ daj daj
= , = ji
dxi daj dxi dxi j=0 12
Training Backpropagation
Case 2: Forward Backward
Neural dJ y (1 y )
Loss J = y HQ; y + (1 y ) HQ;(1 y) = +
Network dy y y 1
1 dJ dJ dy dy 2tT( b)
y= = , =
Sigmoid 1 + 2tT( b) db dy db db (2tT( b) + 1)2
…
D
… dJ dJ db db
b= j zj = , = zj
j=0
d j db d j d j
Linear
dJ dJ db db
= , = j
dzj db dzj dzj
1 dJ dJ dzj dzj 2tT( aj )
Sigmoid zj = = , =
1 + 2tT( aj ) daj dzj daj daj (2tT( aj ) + 1)2
M
dJ dJ daj daj
aj = ji xi = , = xi
i=0
d ji daj d ji d ji
Linear
D
dJ dJ daj daj
= , = ji
dxi daj dxi dxi j=0 13
Training Backpropagation
Whiteboard
– SGD for Neural Network
– Example: Backpropagation for Neural Network
14
Training Backpropagation
Backpropagation (Auto.Diff. -‐ Reverse Mode)
Forward Computation
1. Write an algorithm for evaluating the function y = f(x). The
algorithm defines a directed acyclic graph, where each variable is a
node (i.e. the “computation graph”)
2. Visit each node in topological order.
a. Compute the corresponding variable’s value
b. Store the result at the node
Backward Computation
1. Initialize all partial derivatives dy/duj to 0 and dy/dy = 1.
2. Visit each node in reverse topological order.
For variable ui = gi(v1,…, vN)
a. We already know dy/dui
b. Increment dy/dvj by (dy/dui)(dui/dvj)
(Choice of algorithm ensures computing (dui/dvj) is easy)
Return partial derivatives dy/dui for all variables 15
A Recipe for
Background
Gradients
Machine Learning
1. Given training data: 3. Define goal:
Backpropagation can compute this
gradient!
And it’s a special case of a more
general algorithm called reverse-‐
2. Choose each of these:
mode automatic differentiation that
– Decision function can compute the gradient of any
4. Train with SGD:
differentiable function efficiently!
(take small steps
opposite the gradient)
– Loss function
16
Summary
1. Neural Networks…
– provide a way of learning features
– are highly nonlinear prediction functions
– (can be) a highly parallel network of logistic
regression classifiers
– discover useful hidden representations of the
input
2. Backpropagation…
– provides an efficient way to compute gradients
– is a special case of reverse-‐mode automatic
differentiation
17
DEEP LEARNING
18
Deep Learning Outline
• Background: Computer Vision
– Image Classification
– ILSVRC 2010 -‐ 2016
– Traditional Feature Extraction Methods
– Convolution as Feature Extraction
• Convolutional Neural Networks (CNNs)
– Learning Feature Abstractions
– Common CNN Layers:
• Convolutional Layer
• Max-‐Pooling Layer
• Fully-‐connected Layer (w/tensor input)
• Softmax Layer
• ReLU Layer
– Background: Subgradient
– Architecture: LeNet
– Architecture: AlexNet
• Training a CNN
– SGD for CNNs
– Backpropagation for CNNs
19
Why is everyone talking
Motivation
about Deep Learning?
• Because a lot of money is invested in it…
– DeepMind: Acquired by Google for $400
million
– DNNResearch: Three person startup
(including Geoff Hinton) acquired by Google
for unknown price tag
– Enlitic, Ersatz, MetaMind, Nervana, Skylab:
Deep Learning startups commanding millions
of VC dollars
• Because it made the front page of the
New York Times
20
Why is everyone talking
Motivation
about Deep Learning?
1960s Deep learning:
– Has won numerous pattern recognition
1980s competitions
– Does so with minimal feature
1990s engineering
This wasn’t always the case!
2006 Since 1980s: Form of models hasn’t changed much,
but lots of new tricks…
– More hidden units
2016 – Better (online) optimization
– New nonlinear functions (ReLUs)
– Faster computers (CPUs and GPUs)
21
BACKGROUND: COMPUTER VISION
22
Example: Image Classification
• ImageNet LSVRC-‐2011 contest:
– Dataset: 1.2 million labeled images, 1000 classes
– Task: Given a new image, label it with the correct class
– Multiclass classification problem
• Examples from http://image-‐net.org/
23
24
25
26
Example: Image Classification
Traditional Feature Extraction for Images:
– SIFT
– HOG
27
Example: Image Classification
CNN for Image Classification
(Krizhevsky, Sutskever & Hinton, 2012)
15.3% error on ImageNet LSVRC-‐2012 contest
Input • Five convolutional layers 1000-‐way
image (w/max-‐pooling)
(pixels) • Three fully connected layers softmax
28
Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
CNNs for Image Recognition
30
What’s a convolution?
• Basic idea:
– Pick a 3x3 matrix F of weights
– Slide this over an image and compute the “inner product”
(similarity) of F and the corresponding field of the image, and
replace the pixel in the center of the field with the output of the
inner product operation
• Key point:
– Different convolutions extract different types of low-‐level
“features” from an image
– All that we need to vary to generate these different features is the
weights of F
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 1 1 1 1 1
0 1 0 0 1 0 0 0 0 0 1 0 0 1 0
0 1 0 1 0 0 0 0 1 1 1 0 1 0 0
0 1 1 0 0 0 0 0 1 0 1 1 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0
32
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 3 2 2 3 1
0 1 0 0 1 0 0 0 0 0 2 0 2 1 0
0 1 0 1 0 0 0 0 1 1 2 2 1 0 0
0 1 1 0 0 0 0 0 1 0 3 1 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0
33
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 3 2 2 3 1
0 1 0 0 1 0 0 2 0 2 1 0
0 1 0 1 0 0 0 2 2 1 0 0
0 1 1 0 0 0 0 3 1 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0
34
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 3 2 2 3 1
0 1 0 0 1 0 0 2 0 2 1 0
0 1 0 1 0 0 0 2 2 1 0 0
0 1 1 0 0 0 0 3 1 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0
35
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 3 2 2 3 1
0 1 0 0 1 0 0 2 0 2 1 0
0 1 0 1 0 0 0 2 2 1 0 0
0 1 1 0 0 0 0 3 1 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0
36
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 3
0 1 0 0 1 0 0
0 1 0 1 0 0 0
0 1 1 0 0 0 0
0 1 0 0 0 0 0
0 0 0 0 0 0 0
37
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 3 2
0 1 0 0 1 0 0
0 1 0 1 0 0 0
0 1 1 0 0 0 0
0 1 0 0 0 0 0
0 0 0 0 0 0 0
38
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 3 2 2
0 1 0 0 1 0 0
0 1 0 1 0 0 0
0 1 1 0 0 0 0
0 1 0 0 0 0 0
0 0 0 0 0 0 0
39
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 3 2 2 3
0 1 0 0 1 0 0
0 1 0 1 0 0 0
0 1 1 0 0 0 0
0 1 0 0 0 0 0
0 0 0 0 0 0 0
40
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 3 2 2 3 1
0 1 0 0 1 0 0
0 1 0 1 0 0 0
0 1 1 0 0 0 0
0 1 0 0 0 0 0
0 0 0 0 0 0 0
41
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 3 2 2 3 1
0 1 0 0 1 0 0 2
0 1 0 1 0 0 0
0 1 1 0 0 0 0
0 1 0 0 0 0 0
0 0 0 0 0 0 0
42
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 3 2 2 3 1
0 1 0 0 1 0 0 2 0
0 1 0 1 0 0 0
0 1 1 0 0 0 0
0 1 0 0 0 0 0
0 0 0 0 0 0 0
43
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Convolution
0 1 1 1 1 1 0 3 2 2 3 1
0 1 0 0 1 0 0 2 0 2 1 0
0 1 0 1 0 0 0 2 2 1 0 0
0 1 1 0 0 0 0 3 1 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0
44
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Identity
0 1 1 1 1 1 0 Convolution 1 1 1 1 1
0 1 0 0 1 0 0 0 0 0 1 0 0 1 0
0 1 0 1 0 0 0 0 1 0 1 0 1 0 0
0 1 1 0 0 0 0 0 0 0 1 1 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0
45
Background: Image Processing
A convolution matrix is used in image processing for
tasks such as edge detection, blurring, sharpening, etc.
Input Image
0 0 0 0 0 0 0 Convolved Image
Blurring
0 1 1 1 1 1 0 Convolution .4 .5 .5 .5 .4
0 1 0 0 1 0 0 .1 .1 .1 .4 .2 .3 .6 .3
0 1 0 1 0 0 0 .1 .2 .1 .5 .4 .4 .2 .1
0 1 1 0 0 0 0 .1 .1 .1 .5 .6 .2 .1 0
0 1 0 0 0 0 0 .4 .3 .1 0 0
0 0 0 0 0 0 0
46
What’s a convolution?
http://matlabtricks.com/post-‐5/3x3-‐convolution-‐kernels-‐with-‐online-‐demo
1 1 1 1 1 0 Convolved Image
Convolution
1 0 0 1 0 0
1 0 1 0 0 0 1 1
1 1 0 0 0 0 1 1
1 0 0 0 0 0
0 0 0 0 0 0
54
Downsampling
• Suppose we use a convolution with stride 2
• Only 9 patches visited in input, so only 9 pixels in output
Input Image
1 1 1 1 1 0 Convolved Image
Convolution
1 0 0 1 0 0
3
1 0 1 0 0 0 1 1
1 1 0 0 0 0 1 1
1 0 0 0 0 0
0 0 0 0 0 0
55
Downsampling
• Suppose we use a convolution with stride 2
• Only 9 patches visited in input, so only 9 pixels in output
Input Image
1 1 1 1 1 0 Convolved Image
Convolution
1 0 0 1 0 0
3 3
1 0 1 0 0 0 1 1
1 1 0 0 0 0 1 1
1 0 0 0 0 0
0 0 0 0 0 0
56
Downsampling
• Suppose we use a convolution with stride 2
• Only 9 patches visited in input, so only 9 pixels in output
Input Image
1 1 1 1 1 0 Convolved Image
Convolution
1 0 0 1 0 0
3 3 1
1 0 1 0 0 0 1 1
1 1 0 0 0 0 1 1
1 0 0 0 0 0
0 0 0 0 0 0
57
Downsampling
• Suppose we use a convolution with stride 2
• Only 9 patches visited in input, so only 9 pixels in output
Input Image
1 1 1 1 1 0 Convolved Image
Convolution
1 0 0 1 0 0
3 3 1
1 0 1 0 0 0 1 1
3
1 1 0 0 0 0 1 1
1 0 0 0 0 0
0 0 0 0 0 0
58
Downsampling
• Suppose we use a convolution with stride 2
• Only 9 patches visited in input, so only 9 pixels in output
Input Image
1 1 1 1 1 0 Convolved Image
Convolution
1 0 0 1 0 0
3 3 1
1 0 1 0 0 0 1 1
3 1
1 1 0 0 0 0 1 1
1 0 0 0 0 0
0 0 0 0 0 0
59
Downsampling
• Suppose we use a convolution with stride 2
• Only 9 patches visited in input, so only 9 pixels in output
Input Image
1 1 1 1 1 0 Convolved Image
Convolution
1 0 0 1 0 0
3 3 1
1 0 1 0 0 0 1 1
3 1 0
1 1 0 0 0 0 1 1
1 0 0 0 0 0
0 0 0 0 0 0
60
Downsampling
• Suppose we use a convolution with stride 2
• Only 9 patches visited in input, so only 9 pixels in output
Input Image
1 1 1 1 1 0 Convolved Image
Convolution
1 0 0 1 0 0
3 3 1
1 0 1 0 0 0 1 1
3 1 0
1 1 0 0 0 0 1 1
1
1 0 0 0 0 0
0 0 0 0 0 0
61
Downsampling
• Suppose we use a convolution with stride 2
• Only 9 patches visited in input, so only 9 pixels in output
Input Image
1 1 1 1 1 0 Convolved Image
Convolution
1 0 0 1 0 0
3 3 1
1 0 1 0 0 0 1 1
3 1 0
1 1 0 0 0 0 1 1
1 0
1 0 0 0 0 0
0 0 0 0 0 0
62
Downsampling
• Suppose we use a convolution with stride 2
• Only 9 patches visited in input, so only 9 pixels in output
Input Image
1 1 1 1 1 0 Convolved Image
Convolution
1 0 0 1 0 0
3 3 1
1 0 1 0 0 0 1 1
3 1 0
1 1 0 0 0 0 1 1
1 0 0
1 0 0 0 0 0
0 0 0 0 0 0
63
CONVOLUTIONAL NEURAL NETS
64
Deep Learning Outline
• Background: Computer Vision
– Image Classification
– ILSVRC 2010 -‐ 2016
– Traditional Feature Extraction Methods
– Convolution as Feature Extraction
• Convolutional Neural Networks (CNNs)
– Learning Feature Abstractions
– Common CNN Layers:
• Convolutional Layer
• Max-‐Pooling Layer
• Fully-‐connected Layer (w/tensor input)
• Softmax Layer
• ReLU Layer
– Background: Subgradient
– Architecture: LeNet
– Architecture: AlexNet
• Training a CNN
– SGD for CNNs
– Backpropagation for CNNs
65
Convolutional Neural Network (CNN)
• Typical layers include:
– Convolutional layer
– Max-‐pooling layer
– Fully-‐connected (Linear) layer
– ReLU layer (or some other nonlinear activation function)
– Softmax
• These can be arranged into arbitrarily deep topologies
66
Convolutional Layer
CNN key idea:
Treat convolution matrix as
parameters and learn them!
Input Image
0 0 0 0 0 0 0 Convolved Image
Learned
0 1 1 1 1 1 0 Convolution .4 .5 .5 .5 .4
0 1 0 0 1 0 0 θ11 θ12 θ13 .4 .2 .3 .6 .3
0 1 0 1 0 0 0 θ21 θ22 θ23 .5 .4 .4 .2 .1
0 1 1 0 0 0 0 θ31 θ32 θ33 .5 .6 .2 .1 0
0 1 0 0 0 0 0 .4 .3 .1 0 0
0 0 0 0 0 0 0
67
Downsampling by Averaging
• Downsampling by averaging used to be a common approach
• This is a special case of convolution where the weights are fixed to a
uniform distribution
• The example below uses a stride of 2
Input Image
1 1 1 1 1 0 Convolved Image
Convolution
1 0 0 1 0 0
3/4 3/4 1/4
1 0 1 0 0 0 1/4 1/4
3/4 1/4 0
1 1 0 0 0 0 1/4 1/4
1/4 0 0
1 0 0 0 0 0
0 0 0 0 0 0
68
Max-‐Pooling
• Max-‐pooling is another (common) form of downsampling
• Instead of averaging, we take the max value within the same range as
the equivalently-‐sized convolution
• The example below uses a stride of 2
Input Image
Max-‐Pooled
1 1 1 1 1 0 Image
Max-‐
1 0 0 1 0 0 pooling
1 1 1
1 0 1 0 0 0 xi,j xi,j+1
1 1 0
1 1 0 0 0 0 xi+1,j xi+1,j+1
1 0 0
1 0 0 0 0 0
0 0 0 0 0 0
69
Multi-‐Class Output
Output …
Hidden Layer …
Input …
71
Multi-‐Class Output
(F) Loss
Softmax Layer: J = k=1 yk HQ;(yk )
K
2tT(bl )
K l=12tT(b )
l
l=1
(D) Output (linear)
D
bk = j=0 kj zj k
…
Output
(C) Hidden (nonlinear)
zj = (aj ), j
…
Hidden Layer
(B) Hidden (linear)
M
aj = i=0 ji xi , j
…
Input
(A) Input
Given xi , i
72
Training a CNN
Whiteboard
– SGD for CNNs
– Backpropagation for CNNs
73
Common CNN Layers
Whiteboard
– ReLU Layer
– Background: Subgradient
– Fully-‐connected Layer (w/tensor input)
– Softmax Layer
– Convolutional Layer
– Max-‐Pooling Layer
74
Convolutional Layer
75
Convolutional Layer
76
Max-‐Pooling Layer
77
Max-‐Pooling Layer
78
Convolutional Neural Network (CNN)
• Typical layers include:
– Convolutional layer
– Max-‐pooling layer
– Fully-‐connected (Linear) layer
– ReLU layer (or some other nonlinear activation function)
– Softmax
• These can be arranged into arbitrarily deep topologies
79
Architecture #2: AlexNet
CNN for Image Classification
(Krizhevsky, Sutskever & Hinton, 2012)
15.3% error on ImageNet LSVRC-‐2012 contest
Input • Five convolutional layers 1000-‐way
image (w/max-‐pooling)
(pixels) • Three fully connected layers softmax
80
Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
CNNs for Image Recognition
83
3D Visualization of CNN
http://scs.ryerson.ca/~aharley/vis/conv/
Convolution of a Color Image
• Color images consist of 3 floats per pixel for
RGB (red, green blue) color values
• Convolution must also be 3-‐
A closer look at spatial dimensions: dimensional
activation map
32x32x3 image
5x5x3 filter
32
28
32 28
3 1
85
Figure from Fei-‐Fei Li & Andrej Karpathy & Justin Johnson (CS231N)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 23 27 Jan 2016
Animation of 3D Convolution
http://cs231n.github.io/convolutional-‐networks/
86
Figure from Fei-‐Fei Li & Andrej Karpathy & Justin Johnson (CS231N)
MNIST Digit Recognition with CNNs
(in your browser)
https://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html
87
Figure from Andrej Karpathy
CNN Summary
CNNs
– Are used for all aspects of computer vision, and
have won numerous pattern recognition
competitions
– Able learn interpretable features at different levels
of abstraction
– Typically, consist of convolution layers, pooling
layers, nonlinearities, and fully connected layers
Other Resources:
– Readings on course website
– Andrej Karpathy, CS231n Notes
http://cs231n.github.io/convolutional-‐networks/
88