Documente Academic
Documente Profesional
Documente Cultură
Refreshments
BITS F464/F312
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 1 / 22
Optimization
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 2 / 22
Descent methods – General idea I
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 3 / 22
Descent methods – General idea II
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 4 / 22
Slope and Gradient
To find a local minimum of a function using gradient descent, one takes
steps proportional to the negative of the gradient (or of the approximate
gradient) of the function at the current point.
So, if the gradient is positive, then x = x − (grad x); and if the gradient
is negative, then x = x − (−grad x)
Figure 1: From: Google image plus edits (only for demonstration purpose)
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 5 / 22
Gradient-based methods I
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 6 / 22
Gradient-based methods II
Figure 2: feasible descent directions. Directions from the starting point θnow in
the shaded area are possible descent vector candidates. When d = −g , d is the
steepest descent direction at a local point θnow (Figure from the book:
Neuro-fuzzy and Soft Computing, J. -S. R. Jang et al.)
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 7 / 22
Gradient-based methods III
Since there are multiple dimensions of J, and so the gradient i.e. ∇J, for a
given gradient (g = ∇J) the downhill direction adhere to the following
condition for feasible descent direction:
dJ(θnow + η d )
φ0 (0) = |η=0 = g T d = ||g T || ||d ||cos(ψ(θnew )) < 0, (5)
dη
where ψ signifies the angle between g and d , and so ψ(θnow ) denotes the
angle between gnow and d . This can be verified by the Taylor series
expansion of J:
H.O.T. is the higher order term of η. The H.O.T. along with the second
order term O(η 2 ) will be dominated by g T d when η → 0.
It should be noted that the descent directions does not guarantee
convergence of the algorithm.
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 8 / 22
Gradient-based methods IV
A class of gradient-based descent methods has the following fundamental
form, in which feasible descent directions can be determined by deflecting
the gradients through the multiplication by G (i.e. deflected gradients):
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 9 / 22
Gradient-based methods V
g (θnow ) = ∂J(θ)
∂θ
|θ=θ now =0 (8)
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 10 / 22
Gradient-based methods VI
Effect of η:
Convergence is very slow for small η
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 11 / 22
Gradient-based methods VII
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 12 / 22
Gradient-based methods VIII
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 13 / 22
Newton’s method for finding root of a real-valued function
(revision) I
J(θ(t)
θ(t + 1) = θ(t) − (9)
J 0 (θ(t))
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 14 / 22
Newton’s method for finding root of a real-valued function
(revision) II
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 16 / 22
Newton’s method for minimization of function II
quadratic form assuming that the higher order terms of ||θ − θnow || are
very small:
For the function expansion in Eq.(10), we can find its minimum point θ̂ by
differentiating the Eq.(10) w.r.t. θ and setting it to zero. This
subsequently leads to a set of linear equations:
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 18 / 22
Newton’s method for minimization of function IV
θ̂ = θnow − H −1 g (13)
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 19 / 22
Newton’s method for minimization of function V
Note: A matrix is called positive definite if all its eigen values are
positive. Alternatively, for all x ∈ R and x 6= 0; x T Ax > 0.
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 20 / 22
Neural Networks I
Machine Learning / Neural Networks and Fuzzy Logic Optimization BITS F464/F312 22 / 22