Documente Academic
Documente Profesional
Documente Cultură
Introduction
MR. U. A. NULI
COMPUTER SCIENCE AND ENGINEERING DEPARTMENT
TEXTILE AND ENGINEERING INSTITUTE, ICHALKARANJI
2
In 1959, Arthur Samuel def ned machine learning as, “Field of study
that gives computers the ability to learn without being explicitly
programmed.”
Samuel is credited with creating one of the self-learning computer
programs with his work at IBM.
He focused on games as a way of getting the computer to learn
things.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
4
Definitions
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
6
Learning Problems
1. Hand-written rules and equations are too complex—as in face recognition and
speech recognition
2. The rules of a task are constantly changing—as in fraud detection from transaction
records.
3. The nature of the data keeps changing, and the program needs to adapt—as in
automated trading, energy demand forecasting, and predicting shopping trends.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
8
Machine Learning Applications
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
9
Machine Learning Applications
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
10
Machine Learning Applications
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
11
Machine Learning Applications
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
12
What is learning?
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
13
Model
Data (algorithms, Output
parameters)
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
15
Machine Learning -Architecture
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
16
17
1. Provide a definition of what the learner should learn and the need
for learning.
2. Define the data requirements and the sources of the data.
3. Define if the learner should operate on the dataset in entirety or a
subset will do.
As a first step, the given data is segregated into three datasets: training, validation,
and testing. There is no one hard rule on what percentage of data should be training,
validation, and testing datasets. It can be 70-10-20, 60-30-10, 50-25-25, or any other
values.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
19
Components of Machine Learning
System - Data
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
20
Terms related to data
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
22
Dataset
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
23
Model
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
24
Machine Learning Model:
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
25
Categories of models:
• Logical models
• Geometric Model
• Probabilistic Model
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
26
House price
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
27
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
28
Logical Model
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
29
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
30
Customer feedback for Shoes
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
31
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
32
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
33
Car classification
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
34
Types of learning problems
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
35
Machine Learning Techniques:
For example, classification is a technique for grouping things that are similar.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
36
Machine Learning Techniques:
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
37
Supervised Learning:
Example: Teacher shows set of dog’s images and informs student that these are of Dogs.
What student understands is the properties of dogs that identifies it as dog like
its face, color, voice, etc.
Parent show the child animals like dogs, cats and help them to recognize them.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
38
Supervised Learning:
In Machine Learning, A machine learning model learns from given examples presented
in the form of data.
Input to machine learning model is data and its various attributes are properties through
which model learns.
Similar to teacher in human learning, Along with the data the correct output is also
provided that helps model to learn.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
39
Supervised Learning: Examples
Supervised learning occurs when an algorithm learns from example data and
associated target responses that can consist of numeric values or string labels, such as
classes or tags, in order to later predict the correct response when posed with new
examples.
The aim of supervised machine learning is to build a model that makes predictions based on
evidence in the presence of uncertainty.
A supervised learning algorithm takes a known set of input data and known responses to the
data (output) and trains a model to generate reasonable predictions for the response to new
data.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
41
Supervised Learning - Algorithms
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
42
Unsupervised Learning:
Examples:
humans group banana, apple, orange, etc as fruits because they are from trees
and eaten without cooking or any other processing. (hence common attributes
among these are “ grown on tree” and “eaten without cooking”)
Humans group notebooks, pen, books, pencil as school stationary because these
are useful in school. (hence a common attribute among these is “ useful in school”)
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
43
Unsupervised Learning:
it resembles the methods humans use to figure out that certain objects or events are
from the same class, such as by observing the degree of similarity between objects.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
44
Unsupervised Learning:
It is used to draw inferences from datasets consisting of input data without labeled
(unlabelled) responses.
Unsupervised learning occurs when an algorithm learns from plain examples without
any associated response, leaving to the algorithm to determine the data patterns on
its own.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
45
Unsupervised Learning:
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
46
Unsupervised Learning Examples:
Find maximum accident prone areas for setting up emergency care wards of an hospital
Grouping documents into different categories/topics based on the words used in the documents
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
47
Unsupervised Learning
Unsupervised learning is where you only have input data (X) and no corresponding
output variables.
The goal for unsupervised learning is to model the underlying structure or distribution in
the data in order to learn more about the data.
These are called unsupervised learning because unlike supervised learning above there
is no correct answers and there is no teacher. Algorithms are left to their own devises to
discover and present the interesting structure in the data.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
48
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
49
Model Performance:
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
50
Measuring Prediction Performance
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
51
Measuring error(error metrics):
For prediction type models
ItMr. is also common to use the square root of this quantity called root mean square
U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
52
Measuring error(error metrics):
For prediction type models
n - Total Records
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
53
Measuring error(error metrics):
For classification type models
The number of correct and incorrect predictions are summarized with count values and broken
down by each class. This is the key to the confusion matrix.
The confusion matrix shows the ways in which your classification model is confused when it
makes predictions.
It gives us insight not only into the errors being made by a classifier but more importantly the
types of errors that are being made.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
55
Actual class
Cat Non-cat
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
56
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
57
Actual class
Cat 5 2 0
Predicted
Dog 3 3 2
class
Rabbit 0 1 11
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
58
Terms in error measurement
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
59
Error measurement metrics
Classification Rate/Accuracy:
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
60
Error measurement metrics
Recall:
Recall can be defined as the ratio of the total number of correctly classified positive
examples divide to the total number of positive examples.
High Recall indicates the class is correctly recognized (small number of FN).
Recall is given by the relation:
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
61
Error measurement metrics
Precision:
To get the value of precision we divide the total number of correctly classified
positive examples by the total number of predicted positive examples.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
62
Metrics used with confusion matrix
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
63
Feature engineering:
Feature: A feature is an attribute of data that is meaningful to the machine learning process.
Feature engineering is the process of using domain knowledge of the data to create features that
make machine learning algorithms work.
Performance of Machine Learning algorithms depends on quality of input data and hence
features.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
64
- Feature may exist with lot of problems like missing values, outliers, different types, error
in data collection.
Before using features to train machine learning model, it is necessary to clean, transform
And select right set of features.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
65
Feature engineering:
Feature engineering is the process of transforming data into features that better
represent the underlying problem, resulting in improved machine learning
performance
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
66
Feature engineering: - Feature
Extraction
Feature Extraction is a process of selecting new features from existing features or raw
data by carrying out some transformation or using some extraction procedure in order to
Reduce redundancy in the features.
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
67
Feature engineering: - Preparing Data
Preparing Data takes into account capturing data, storing data, cleaning
data, Organizing data and so on.
Cleaning refers to the process of transforming data into a format that can
be easily interpreted by databases.
** Are there names in the names columns, addresses in the addresses columns, phone
numbers in the phone numbers column? Or is there different data in the columns?
Does the data abide by the appropriate rules for its field?
** Are the characters in a name only alphabetical (Brendan) or are there numbers in it
(B4rendan)?
** Is the numerical portion of a phone number 10 digits (5558675309) or not (675309)?
how many values are nulls? Is the number of nulls acceptable? Is there a pattern as to where
there are null values?
Are there duplicates and is that okay?
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
69
Duplicate observations/records –
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
71
Data cleaning- Handling Missing data
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
72
Data organization – Representing
Features
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
73
Handling Numerical Features
Min- Max Normalization: this process brings the feature values in the range of 0 to 1
First it calculate min and max values of the feature and then transforms each value
By using formula
Z-Score Normalization:
Z = ( X – mean(X))/StdDev(X)
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
74
Handling Numerical Features
1. Define Problem
2. Collect data
3. Prepare Data
4. Split data in training validation and testing
5. Algorithm Selection
6. Training the algorithm
7. Evaluate Test Data
8. Parameter Tuning
9. Start Using the model
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji
76
End
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji