Documente Academic
Documente Profesional
Documente Cultură
Evaluation metric:
„Least squared error”
Linear regression
Linear regression
g(x) = w1x + w0
Its gradient is 0 if
Regression variants
+MLE →
– Bayes
– k nearest neighbur’s
• mean or
• distance weighted average
– Decision tree
0
Tj netj
Differentiable activation
functions
• Enables gradient descent-based learning
• The sigmoid function:
1
f (net j ) ( net j T j ) 1
1 e
0
Tj netj
Output layer
nH nH
net k y j wkj wk 0 y j wkj wkt y
j 1 j 0
J
w
w
Backpropagation
The error of the weights between the hidden and
output layers:
J J netk netk
. k
wkj netk wkj wkj
J
k
net k
net k
because netk = wkty: yj
w kj
and:
z k f (net k )
J J z k
k . (t k z k ) f ' (net k )
net k z k net k
J J y j net j
. .
w ji y j net j w ji
J 1 c 2
c
zk
y j y j 2 (tk zk ) (tk zk ) y
k 1 k 1 j
c
zk netk c
(tk zk ) . (tk zk ) f ' (netk ) wkj
k 1 netk y j k 1
k
The error signal of the hidden units:
c
j f ' ( net j ) w kj k
k 1
input
Backpropagation
Calculate the error signal for hidden
neurons
output
c
j f ' ( net j ) w kj k rejtett
k 1
input
Backpropagation
Update the weights between the input
and hidden neurons
output
rejtett
updating the ones to j
w ji j xi
input
Training of neural networks
w initialised randomly
n
J Jp
p 1
Stopping based on the performance
on a validation dataset
– The usage of unseen training instances for
estimating the performance of supervised
learning (to avoid overfitting)
• Learning rate!?
Outlook
History of neural networks
• Perceptron: one of the first
machine learners ~1950
• Backpropagation: multilayer
perceptrons, 1975-
• Deep learning:
popular again 2006-
Deep learning
(auto-encoder pretraining)
Recurrent neural networks
short term memory
http://www.youtube.com/watch?v=vmDByFN6eig