Machine Learning Unit1

Machine Learning
Introduction
MR. U. A. NULI
COMPUTER SCIENCE AND ENGINEERING DEPARTMENT
TEXTILE AND ENGINEERING INSTITUTE, ICHALKARANJI
2
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and

Engineering Institute, Ichalkaranji
3
History
Alan Turing, In his 1950 paper, “Computing Machinery and

Intelligence,” asked, “Can machines think?”
In 1959, Arthur Samuel def ned machine learning as, “Field of study
that gives computers the ability to learn without being explicitly
programmed.”
Samuel is credited with creating one of the self-learning computer
programs with his work at IBM.
He focused on games as a way of getting the computer to learn
things.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
4
Definitions
"Machine learning is a scientific discipline that is concerned with the

design and development of algorithms that allow computers to evolve
behaviours based on empirical data, such as from sensor data or
databases."
-Wikipedia
"Machine learning is the training of a model from data that

generalizes a decision against a performance measure."
– Jason Brownlee
5
Definitions
A computer program is said to learn from experience E with respect to

some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.
-- Tom Mitchell
For example, a computer program that learns to play checkers might

improve its performance as measured by its ability to win at the class of
tasks involving playing checkers games, through experience obtained by
playing games against itself
6
Learning Problems
1. Checkers learning problem:

Task T: playing checkers
Performance measure P: percent of games won against opponents
Training experience E: playing practice games against itself
2. handwriting recognition learning problem

Task T: recognizing and classifying handwritten words within images
Performance measure P : percent of words correctly classified
Training experience E: a database of handwritten words with given
classifications
When Should You Use Machine 7
Learning?
1. Hand-written rules and equations are too complex—as in face recognition and
speech recognition
2. The rules of a task are constantly changing—as in fraud detection from transaction
records.
3. The nature of the data keeps changing, and the program needs to adapt—as in
automated trading, energy demand forecasting, and predicting shopping trends.
8
Machine Learning Applications
9
10
11
• Identification of unwanted spam messages in e-mail

• Segmentation of customer behaviour for targeted advertising
• Forecasts of weather behaviour and long-term climate changes
• Reduction of fraudulent credit card transactions
• Actuarial estimates of financial damage of storms and natural disasters
• Prediction of popular election outcomes
• Development of algorithms for auto-piloting drones and self-driving cars
• Optimization of energy use in homes and office buildings
• Projection of areas where criminal activity is most likely
• Discovery of genetic sequences linked to diseases
12
What is learning?
Learning is the process of acquiring new or modifying

existing knowledge, behaviour, skills, values, or preferences
13

14
Machine Learning -Architecture
Model
Data (algorithms, Output
parameters)
15
Machine Learning -Architecture
16

Detailed Diagram:
17

18
What is learning?
The following are some considerations to define a learning problem:
1. Provide a definition of what the learner should learn and the need
for learning.
2. Define the data requirements and the sources of the data.
3. Define if the learner should operate on the dataset in entirety or a
subset will do.
As a first step, the given data is segregated into three datasets: training, validation,
and testing. There is no one hard rule on what percentage of data should be training,
validation, and testing datasets. It can be 70-10-20, 60-30-10, 50-25-25, or any other
values.
19
Components of Machine Learning
System - Data
• Data forms the main source of learning in Machine learning
• Data is a representation of human experience in machine learning system.
• Data can be any format – structured, semi-structured and unstructured

• Data can be received at any frequency, can be static or dynamic
• Data can be of any size
• Data can have any dimensions (number of features or attributes)
20
Terms related to data
Term Purpose or meaning in the context of Machine Learning

Feature, This is a single column of data being referenced by the
attribute, field, learning algorithms. Some features can be input to the
or variable learning algorithm, and some can be the outputs.
Instance This is a single row of data in the dataset.
Feature This is a list of features
vector or tuple
Dimension This is a subset of attributes used to describe a property
of data. For example, a date dimension consists of three
attributes: day, month, and year.
Dataset A collection of rows or instances is called a dataset.
Machine learning has different types of datasets that are meant to be used for
different purposes. These are: Training, Testing and evaluation datasets
21
Terms related to data
Term Purpose or meaning in the context of Machine Learning

Training The training dataset is the dataset that is the base dataset
Dataset against which the model is built or trained.
Testing The testing dataset is the dataset that is used to validate
Dataset the model built. This dataset is also referred to as a
validating dataset.
Evaluation The evaluation dataset is the dataset that is used for final
Dataset verification of the model (and can be treated more as user
acceptance testing).
22
Dataset
23
Model
A simplified description, especially a mathematical one, of a system

or process, to assist calculations and predictions.
--oxford dictionary
‘a statistical model used for predicting the survival rates of

endangered species’
mathematical model : a representation in mathematical terms of the

behaviour of real devices and objects
24
Machine Learning Model:
Input Data Feature Features Model: Output Prediction

Extraction Algorithms,
Classification
Parameters,
Etc.
Knowledge representation
25
Categories of models:
• Logical models
• Geometric Model
• Probabilistic Model
26
House price
27
28
Logical Model
29
30
Customer feedback for Shoes
31
32
33
Car classification
34
Types of learning problems
35
Machine Learning Techniques:
A technique is a way of solving a problem.
For example, classification is a technique for grouping things that are similar.
To actually do classification on some data, a data scientist would have to employ a

specific algorithm like Decision Trees (though there are many other classification
algorithms to choose from).
36
Machine Learning Techniques:
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
37
Supervised Learning:
Supervised learning is similar to human learning in presence of Supervisor or Teacher.
Supervisor/Teacher’s roll is to provide correct feedback to learner.
Example: Teacher shows set of dog’s images and informs student that these are of Dogs.
student learns from the images the animal called DOG
What student understands is the properties of dogs that identifies it as dog like
its face, color, voice, etc.
Parent show the child animals like dogs, cats and help them to recognize them.
38
Supervised Learning:
In Machine Learning, A machine learning model learns from given examples presented
in the form of data.
Input to machine learning model is data and its various attributes are properties through
which model learns.
Similar to teacher in human learning, Along with the data the correct output is also
provided that helps model to learn.
39
Supervised Learning: Examples
Data Values Output

Attributes(Properties)
Model Learns as
Color Brown Dog
Dog
Height 24 inch
Legs 4
Data Values Output

Attributes(Properties)
No of wheels 4 Vehicle- Car Model Learns as
gears 6 Vehicle- car
Max speed 200
Weight 800 Kg
40
Supervised Learning
Supervised learning occurs when an algorithm learns from example data and
associated target responses that can consist of numeric values or string labels, such as
classes or tags, in order to later predict the correct response when posed with new
examples.
The aim of supervised machine learning is to build a model that makes predictions based on
evidence in the presence of uncertainty.
A supervised learning algorithm takes a known set of input data and known responses to the
data (output) and trains a model to generate reasonable predictions for the response to new
data.
41
Supervised Learning - Algorithms
List of Common Algorithms in Supervised Learning:

• Nearest Neighbour Classifier
• Naive Bayes
• Decision Trees
• Linear Regression
• Support Vector Machines (SVM)
• Neural Networks
42
Unsupervised Learning:
This is learning without teachers
Its learning a new concept comparing it with another concept.
This is basically human’s ability to group similar elements
Examples:
humans group banana, apple, orange, etc as fruits because they are from trees
and eaten without cooking or any other processing. (hence common attributes
among these are “ grown on tree” and “eaten without cooking”)
Humans group notebooks, pen, books, pencil as school stationary because these
are useful in school. (hence a common attribute among these is “ useful in school”)
43
it resembles the methods humans use to figure out that certain objects or events are
from the same class, such as by observing the degree of similarity between objects.
Important characteristics of unsupervised learning is to find similarity between two

Events or objects.
44
Unsupervised learning finds hidden patterns or intrinsic structures in data.
It is used to draw inferences from datasets consisting of input data without labeled
(unlabelled) responses.
Unsupervised learning occurs when an algorithm learns from plain examples without
any associated response, leaving to the algorithm to determine the data patterns on
its own.
45
Grouping of six fruits given below:
Fruit Common Attribute - color Category(Group)

Mango Yellow Ripe Fruit
banana Yellow
guava Yellow
Mango Green Raw Fruit
banana Green
guava Green
46
Unsupervised Learning Examples:
Find high crime area for setting up patrol vans
Find maximum accident prone areas for setting up emergency care wards of an hospital
Grouping documents into different categories/topics based on the words used in the documents
47
Unsupervised Learning
Unsupervised learning is where you only have input data (X) and no corresponding
output variables.
The goal for unsupervised learning is to model the underlying structure or distribution in
the data in order to learn more about the data.
These are called unsupervised learning because unlike supervised learning above there
is no correct answers and there is no teacher. Algorithms are left to their own devises to
discover and present the interesting structure in the data.
48
49
Model Performance:
Is the solution created good?
- Model may or may not give accurate results
- If the model is not giving accurate result, how to measure error?
50
Measuring Prediction Performance
If a machine learning model is predicting house prices in a city, then how

much accurately it Is predicting it?
51
Measuring error(error metrics):
For prediction type models
1. Mean Square Error(MSE)
If MSE is zero or close to zero,

model is predicting the value
Accurately.
Pi - Predicted value of the ith record
Larger MSE value indicate poor
A i - Actual value of the ith record
Model Performance and needs
n - Total Records
Further training
ItMr. is also common to use the square root of this quantity called root mean square
U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
52
For prediction type models
Mean absolute error (MAE):
Pi - Predicted value of the ith record
A i - Actual value of the ith record
n - Total Records
53
For classification type models
Confusion Matrix(Error Matrix) in Machine Learning :
• A confusion matrix is a table that is often used to describe the

performance of a classification model (or “classifier”) on a set of test
data for which the true values are known.
• It allows the visualization of the performance of an algorithm.
• It allows easy identification of confusion between classes e.g. one class is

commonly mislabelled as the other.
• Most performance measures are computed from the confusion matrix.

54
A confusion matrix is a summary of prediction results on a classification problem.
The number of correct and incorrect predictions are summarized with count values and broken
down by each class. This is the key to the confusion matrix.
The confusion matrix shows the ways in which your classification model is confused when it
makes predictions.
It gives us insight not only into the errors being made by a classifier but more importantly the
types of errors that are being made.
55
Actual class
Cat Non-cat
Cat 5 True Positives 2 False Positives

Predicted
class
Non-cat 3 False Negatives 17 True Negatives
56
Predicting Behavior of 10000 customers
57
Assuming a sample of 27 animals — 8 cats, 6 dogs, and 13 rabbits, the

resulting confusion matrix could look like the table below:
Actual class
Cat Dog Rabbit
Cat 5 2 0
Predicted
Dog 3 3 2
class
Rabbit 0 1 11
58
Terms in error measurement
Definition of the Terms:

• Positive (P) : Observation is positive (for example: is an apple).
• Negative (N) : Observation is not positive (for example: is not an apple).
• True Positive (TP) : Observation is positive, and is predicted to be positive.
• False Negative (FN) : Observation is positive, but is predicted negative.
• True Negative (TN) : Observation is negative, and is predicted to be negative.
• False Positive (FP) : Observation is negative, but is predicted positive.
59
Error measurement metrics
Classification Rate/Accuracy:
Classification Rate or Accuracy is given by the relation:
60
Recall:
Recall can be defined as the ratio of the total number of correctly classified positive
examples divide to the total number of positive examples.
High Recall indicates the class is correctly recognized (small number of FN).
Recall is given by the relation:
61
Precision:
To get the value of precision we divide the total number of correctly classified
positive examples by the total number of predicted positive examples.
High Precision indicates an example labelled as positive is indeed positive (small

number of FP).
Precision is given by the relation:
62
Metrics used with confusion matrix
63
Feature engineering:
Feature: A feature is an attribute of data that is meaningful to the machine learning process.
Feature engineering is the process of using domain knowledge of the data to create features that
make machine learning algorithms work.
Performance of Machine Learning algorithms depends on quality of input data and hence
features.
64
Data scientist spend 50% of their time on features
Why Feature Engineering?
- Feature may exist with lot of problems like missing values, outliers, different types, error
in data collection.
Before using features to train machine learning model, it is necessary to clean, transform
And select right set of features.
65
Feature engineering:
Feature engineering is the process of transforming data into features that better
represent the underlying problem, resulting in improved machine learning
performance
Benefits of spending time in feature engineering:
1. Model becomes simple due to selected and limited features.

2. It performs faster than complex model with large number of features.
3. Reduces model selection time, since limited features give better insight into data
relationship.
4. Reduces training time.
66
Feature engineering: - Feature
Extraction
Feature Extraction is a process of selecting new features from existing features or raw
data by carrying out some transformation or using some extraction procedure in order to
Reduce redundancy in the features.
Primary goals of feature extraction is to reduce redundancy in feature and dimensions of

Feature vector.
67
Feature engineering: - Preparing Data
Preparing Data takes into account capturing data, storing data, cleaning
data, Organizing data and so on.
Cleaning refers to the process of transforming data into a format that can
be easily interpreted by databases.
Organizing generally refers to a more radical transformation.
Organizing tends to involve changing the entire format of the dataset

into a much neater format, such as transforming raw chat logs into a
tabular row/column structure.
68
Data Cleaning
Does the data match the column label?
** Are there names in the names columns, addresses in the addresses columns, phone
numbers in the phone numbers column? Or is there different data in the columns?
Does the data abide by the appropriate rules for its field?
** Are the characters in a name only alphabetical (Brendan) or are there numbers in it
(B4rendan)?
** Is the numerical portion of a phone number 10 digits (5558675309) or not (675309)?
how many values are nulls? Is the number of nulls acceptable? Is there a pattern as to where
there are null values?
Are there duplicates and is that okay?
69

70
Data cleaning – removing unwanted
observations
Unwanted Observations - duplicate or irrelevant observations
Duplicate observations/records –
Duplicate observations most frequently arise during data collection, such

as when you:
• Combine datasets from multiple places
• Scrape data
• Receive data from clients/other departments
71
Data cleaning- Handling Missing data
City Temperature Humidity

If a feature having most of missing values then
It can be removed from dataset Kolhapur 29 80
Ichalkaranji 28 -
Example: Humidity
Sangli - -
If a row having missing values it can be removed Miraj - -
Completely Karad 27 -
Example: sangli, miraj
Pune 23 -
A missing value can be replaced by mean/median Mumbai 26 90
value of that feature or zero value. Satara - -
Benglore 23 -
72
Data organization – Representing
Features
Handling Numerical Features:
A Numerical Feature holds numerical value.
Example: Cost of a product, temperature, size of a house, distance, etc..
Certain Numerical features can be accepted without any transformation whereas

Few needs transformation.
Rounding – Features can be rounded to integers or to few decimal places.

example: 0.234561 can be rounded to 0.235
73
Handling Numerical Features
Min- Max Normalization: this process brings the feature values in the range of 0 to 1
First it calculate min and max values of the feature and then transforms each value
By using formula
Xinew = ( Xi – min(X))/(max(x) – Min(x))
Not suitable when data contains outliers
Z-Score Normalization:
Z = ( X – mean(X))/StdDev(X)
74
Handling Numerical Features
Binning – a process of converting numerical value into categories or bins
Example: age – 5,12,7,9,25,32,21,35,37,45,53,67,74,83,99,123,125
This can be categories as

Age Category
0-10 1
11-20 2
21-30 3
> 91 10
This method reduces outlier effect

75
Machine Learning Process:
1. Define Problem
2. Collect data
3. Prepare Data
4. Split data in training validation and testing
5. Algorithm Selection
6. Training the algorithm
7. Evaluate Test Data
8. Parameter Tuning
9. Start Using the model
76

77
End

Machine Learning Unit1

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Machine Learning Unit1

Încărcat de

Drepturi de autor:

Formate disponibile

Machine Learning

Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and

Alan Turing, In his 1950 paper, “Computing Machinery and

"Machine learning is a scientific discipline that is concerned with the

"Machine learning is the training of a model from data that

A computer program is said to learn from experience E with respect to

For example, a computer program that learns to play checkers might

1. Checkers learning problem:

2. handwriting recognition learning problem

• Identification of unwanted spam messages in e-mail

Learning is the process of acquiring new or modifying

Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and

Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and

Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and

The following are some considerations to define a learning problem:

• Data forms the main source of learning in Machine learning

• Data is a representation of human experience in machine learning system.

• Data can be any format – structured, semi-structured and unstructured

Term Purpose or meaning in the context of Machine Learning

Term Purpose or meaning in the context of Machine Learning

A simplified description, especially a mathematical one, of a system

‘a statistical model used for predicting the survival rates of

mathematical model : a representation in mathematical terms of the

Input Data Feature Features Model: Output Prediction

A technique is a way of solving a problem.

To actually do classification on some data, a data scientist would have to employ a

Supervised learning is similar to human learning in presence of Supervisor or Teacher.

Supervisor/Teacher’s roll is to provide correct feedback to learner.

student learns from the images the animal called DOG

Data Values Output

Data Values Output

List of Common Algorithms in Supervised Learning:

This is learning without teachers

Its learning a new concept comparing it with another concept.

This is basically human’s ability to group similar elements

Important characteristics of unsupervised learning is to find similarity between two

Unsupervised learning finds hidden patterns or intrinsic structures in data.

Grouping of six fruits given below:

Fruit Common Attribute - color Category(Group)

Find high crime area for setting up patrol vans

Is the solution created good?

- Model may or may not give accurate results

- If the model is not giving accurate result, how to measure error?

If a machine learning model is predicting house prices in a city, then how

1. Mean Square Error(MSE)

If MSE is zero or close to zero,

Mean absolute error (MAE):

Pi - Predicted value of the ith record

A i - Actual value of the ith record

Confusion Matrix(Error Matrix) in Machine Learning :

• A confusion matrix is a table that is often used to describe the

• It allows the visualization of the performance of an algorithm.

• It allows easy identification of confusion between classes e.g. one class is

• Most performance measures are computed from the confusion matrix.

A confusion matrix is a summary of prediction results on a classification problem.

Cat 5 True Positives 2 False Positives

Predicting Behavior of 10000 customers

Assuming a sample of 27 animals — 8 cats, 6 dogs, and 13 rabbits, the

Cat Dog Rabbit

Definition of the Terms:

Classification Rate or Accuracy is given by the relation:

High Precision indicates an example labelled as positive is indeed positive (small

Data scientist spend 50% of their time on features