Documente Academic
Documente Profesional
Documente Cultură
Institute of Computing
University of Campinas
1
Lecun et al., Deep learning, 2015
Traditional Machine Learning Pipeline
x1
x2
x3
W(3)
x W(1) h(1) (x) W(2) h(2) (x)
Activation function
0.8
Sigmoid activation function
0.6
I Puts pre-activations between
0.4
0 and 1
I Always positive 0.2
I Bounded 0
I Strictly increasing −4 −2 0 2 4
1
g (x) =
(1 + exp(−x)
Activation function
5
Rectified linear activation
function 4
I Puts pre-activations between 3
0 and 1
2
I Always non negative
1
I Not upper bounded
I Strictly increasing 0
−4 −2 0 2 4
I Tends to give neurons with
sparse activities
g (x) = max(0, a)
Activation function
Softmax activation function
exp(ac )
softmax(ac ) = PC
1 exp(ai )
I Strictly positive
I Sums to one
I Predicted class is the activation with highest estimated
probability.
Forward propagation
I Can be represented as an
acyclic flow graph.
I The output of each box can
be calculated given its
parents.
2
Hugo Lachorelle, Neural Networks, 2017
Optimization
procedure StochasticGradientDescent
require: Network parameters θ
Initialize θ . (θ = W(1) , b (1) , ..., W(L+1) , b (L+1) ) )
for N epochs do
for each training sample (x (t) , y (t) ) do
∆ = −∇θ L(f (x(t) ; θ), y (t) ) − λ∇θ Ω(θ)
θ = θ + α∆
0.8
0.6
Sigmoid activation function
0.4
I Partial derivative:
0.2
0
g (a) = g (a)(1 − g (a)) 0
−4 −2 0 2 4
1
g (x) =
(1 + exp(−x)
Activation function
g (x) = max(0, a)
Regularization
I L2 regularization.
X X X (k)
Ω(θ) = (Wi,j )2
k i j
I Gradient:
∇W(k) Ω(θ) = 2W(k)
Applications