Sunteți pe pagina 1din 51

MACHINE

LEARNING
An introduction
APPLICATIONS

2
WHY MACHINE LEARNING?
An agent is learning if it improves its performance on future tasks after
making observations about the world.

We cant anticipate all possible situations.


We cant anticipate all changes over time.
Sometimes, we dont know how to program a solution.

3
TYPES OF MACHINE
LEARNING
Supervised
Unsupervised
Reinforcement

4
SUPERVISED LEARNING
From a collection of input-output pairs, learn a function that predicts the
output for new inputs.
Requires a collection of labelled data.

5
UNSUPERVISED LEARNING
Agent learns patterns in data, even though no explicit feedback is given.
Clustering is an example of supervised learning.

6
REINFORCEMENT LEARNING
Learns from a series of reinforcements rewards and punishments.
Rewards and punishments are designed to help the agent achieve the goal.
Reinforcements need not be given at the end of each action.
Credit assignment problem.

7
SUPERVISED LEARNING
Given a training set of N samples

where is generated by an unknown function


discover a function that approximates the true function .
x and y need not be scalars. They can also be vectors.

8
HYPOTHESIS
The hypothesis must be able to generalize well on data unseen during
training.
It is said to generalise well if it correctly predicts the value of for novel .

Moreover is a stochastic function of .

9
TYPES
Regression
Classification

10
REGRESSION

Housing price prediction

Stock value prediction

11
CLASSIFICATION

12
FEATURES
The in the training data are called features.
As mentioned, they need not be scalars. They can also be vectors.
How to choose features ?
Oranges vs Apples

13
LINEAR REGRESSION
One solution to regression could be

But the output is a linear function of input.


Most of the processes are non-linear in nature and hence this is not very
useful.

14
EXTENSION
The previous case can be extended to

This is a non-linear function in .


This is still called linear regression because the equation is linear in terms
of the weights.

15
16
BASIS FUNCTION
Basis function is a non-linear function of input variable.
Represented by .
In the previous case, the basis functions were a polynomial functions.

17
MULTIVARIATE LINEAR
REGRESSION
The output might depend on multiple inputs.
All the inputs are captures in terms of a vector. Vectors will be represented
by boldface letters.

18
SOLUTIONS
Now that the model of the system is in place, how do we find the vector ?

Normal equation

Gradient descent

19
NORMAL EQUATION

Then,

20
ERROR FUNCTION
Least square error

21
GRADIENT DESCENT

Differentiation of the error function wrt the weights are required.

22
ERROR GRAPH

23
MINIMIZATION

24
VISUALIZATION

25
ITERATION VS ERROR

26
CODE

27
OVERFITTING
A learning algorithm must perform well on novel data.
Hence it must be generalise well.
Failure to do so will lead to large errors on testing data.

28
OVERFITTING

29
ERROR DURING
OVERFITTING

30
WEIGHTS DURING
OVERFITTING

31
WITH LARGE DATA SET

32
REGULARIZATION
We need to minimize the value of the value of the weights, hence we add
regularization.

33
CURVE FITTING WITH
REGULARIZATION

34
BIAS VS VARIANCE
High bias is when the training has underfit the data.
High variance is when the training has overfit the data.

If training error and cross validation error are both high, then the algorithm
suffers from high bias (underfitting).
If training error is low and cv error is high, the algorithm is suffering from
high variance problem. (overfitting)

35
BIAS AND VARIANCE

36
HIGH BIAS (UNDERFIT)
Getting more data wont be useful.

37
HIGH VARIANCE (OVERFIT)
More data can improve the efficiency of the learning algorithm.

38
LOGISTIC REGRESSION
A small change to linear regression will convert it to logistic regression
which is used for binary classification.

Sigmoid is a function given by

Assumed as probability

39
WHY SIGMOID?

40
K-NEAREST NEIGHBORS
One of the simplest machine learning algorithms for classification.
An input is classified based on majority vote of neighbors. The number of
neighbors are decided k (a small positive number)
Ifk=1, then the object is simply assigned to the class of that single
nearest neighbor.
The neighbors are taken from training set but not explicit training happens.
Distance is described by norm function.
Generally, it is assumed to be Euclidian distance.

41
NN VS 5-NN
CLASSIFICATION

42
SENSITIVE TO LOCAL
STRUCTURE

43
SKEWED CLASS
DISTRIBUTION
If the class distribution is skewed, majority voting will lead to wrong results.
Rather the vote from each class can weighted based on the distance from
the unlabelled data.

44
SELECTION OF K
Larger the value of k, lesser is the effect of noise on the output.
In binary classification, it is helpful to choose k to be an odd number to
avoid ties.

Hyper-parameter tuning.

45
ADVANTAGES OF KNN

It is very simple to implement and understand.


No training time (K-dimensional trees)

46
DISADVANTAGES
Computational time for classification increases with the training data size.
Training data is required during classification.
Pixel-wise distance metric for classification leads to classification of data
based on color distribution rather than perceptual or sematic similarity.

47
DISADVANTAGES

48
EVALUATION METRIC FOR
CLASSIFICATION
Useful when the data set is skewed.
Precision is the fraction of all retrieved elements that are relevant.
Precision => How useful the results are.
Recall is the fraction of all relevant instances that are retrieved.
Recall => How complete the results are.

49
PRECISION AND RECALL

50
F-SCORE

51

S-ar putea să vă placă și