Bine ați venit la Scribd!

Săriți peste schemele de tip carusel

Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers

Încărcat de

Tanweer Anwar

0% au considerat acest document util (0 voturi)

25 vizualizări15 pagini

Knn classifier

Titlu original

BI-kNN

Drepturi de autor

Formate disponibile

PDF, TXT sau citiți online pe Scribd

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Raportați acest document

Knn classifier

Drepturi de autor:

Formate disponibile

Descărcați ca PDF, TXT sau citiți online pe Scribd

Indicator pentru conținut neadecvat

0% au considerat acest document util (0 voturi)

25 vizualizări15 pagini

Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers

Încărcat de

Tanweer Anwar

Knn classifier

Drepturi de autor:

Formate disponibile

Descărcați ca PDF, TXT sau citiți online pe Scribd

Indicator pentru conținut neadecvat

Salt la pagina

Sunteți pe pagina 1din 15

Căutați în document

Supervised Learning

and k Nearest
Neighbors
Business Intelligence for Managers
Supervised learning
and classification
 Given: dataset of instances with
known categories
 Goal: using the “knowledge” in the
dataset, classify a given instance
⚫ predict the category of the given
instance that is rationally consistent
with the dataset
Classifiers

X1
X2
feature X3 Y
… Classifier category
values
Xn

collection of instances
DB
with known categories
Algorithms

 K Nearest Neighbors (kNN)

 Naïve-Bayes

 Decision trees

 Many others (support vector

machines, neural networks, genetic
algorithms, etc)
K - Nearest Neighbors

 For a given instance T, get the top k

dataset instances that are “nearest” to
T
⚫ Select a reasonable distance measure
 Inspect the category of these k
instances, choose the category C that
represent the most instances
 Conclude that T belongs to category C
Example 1

 Determining decision on scholarship

application based on the following features:
⚫ Household income (annual income in
millions of pesos)
⚫ Number of siblings in family
⚫ High school grade (on a QPI scale of 1.0 –
4.0)
 Intuition (reflected on data set): award
scholarships to high-performers and to
those with financial need
Distance formula

 Euclidian distance: squareroot of sum

of squares of differences

for two features: (x)2 + (y)2

 Intuition: similar samples should be

close to each other
⚫ May not always apply
(example: quota and actual sales)
Incomparable ranges

 The Euclidian distance formula has

the implicit assumption that the
different dimensions are comparable
 Features that span wider ranges
affect the distance value more than
features with limited ranges
Example revisited

 Suppose household income was

instead indicated in thousands of
pesos per month and that grades are
given on a 70-100 scale
 Note different results produced by
kNN algorithm on the same dataset
Non-numeric data

 Feature values are not always

numbers
 Example
⚫ Boolean values: Yes or no, presence
or absence of an attribute
⚫ Categories: Colors, educational
attainment, gender
 How do these values factor into the
computation of distance?
Dealing with non-numeric
data
 Boolean values => convert to 0 or 1
⚫ Applies to yes-no/presence-absence
attributes
 Non-binary characterizations
⚫ Use natural progression when applicable;
e.g., educational attainment: GS, HS,
College, MS, PHD => 1,2,3,4,5
⚫ Assign arbitrary numbers but be careful
about distances; e.g., color: red, yellow, blue
=> 1,2,3
 How about unavailable data?
(0 value not always the answer)
Preprocessing your dataset

 Dataset may need to be preprocessed

to ensure more reliable data mining
results
 Conversion of non-numeric data to
numeric data
 Calibration of numeric data to reduce
effects of disparate ranges
⚫ Particularly when using the Euclidean
distance metric
k-NN variations

 Value of k
⚫ Larger k increases confidence in prediction
⚫ Note that if k is too large, decision may be
skewed
 Weighted evaluation of nearest neighbors
⚫ Plain majority may unfairly skew decision
⚫ Revise algorithm so that closer neighbors
have greater “vote weight”
 Other distance measures
Other distance measures

 City-block distance (Manhattan dist)

⚫ Add absolute value of differences
 Cosine similarity
⚫ Measure angle formed by the two samples
(with the origin)
 Jaccard distance
⚫ Determine percentage of exact matches
between the samples (not including
unavailable data)
 Others
k-NN Time Complexity

 Suppose there are m instances and n

features in the dataset
 Nearest neighbor algorithm requires
computing m distances
 Each distance computation involves
scanning through each feature value
 Running time complexity is
proportional to m X n

S-ar putea să vă placă și

Data Mining in Banking and Finance
Document14 pagini
Data Mining in Banking and Finance
shweta_46664
100% (1)
A Novel Approach For Predicting Football Match Results: An Evaluation of Classification Algorithms
Document8 pagini
A Novel Approach For Predicting Football Match Results: An Evaluation of Classification Algorithms
IJAR JOURNAL
Încă nu există evaluări
Lecture Week 2 KNN and Model Evaluation PDF
Document53 pagini
Lecture Week 2 KNN and Model Evaluation PDF
Hoàng Phạm
100% (1)
ML Lecture 11 (K Nearest Neighbors)
Document13 pagini
ML Lecture 11 (K Nearest Neighbors)
Md Fazle Rabby
Încă nu există evaluări
Data Mining To Improve Personnel Selection and Enhance Human Capital: A Case Study in High-Technology Industry
Document11 pagini
Data Mining To Improve Personnel Selection and Enhance Human Capital: A Case Study in High-Technology Industry
bonzoseco
Încă nu există evaluări
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
Document25 pagini
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
Alwan Siddiq
Încă nu există evaluări
K Nearest Neighbor Algorithm: Fundamentals and Applications
De la Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
Încă nu există evaluări
Lecture 07 KNN 14112022 034756pm
Document24 pagini
Lecture 07 KNN 14112022 034756pm
Misbah
100% (1)
Data Science Bootcamp Curriculum 2
Document7 pagini
Data Science Bootcamp Curriculum 2
khiari
Încă nu există evaluări
K Nearest Neighbor KNN
Document18 pagini
K Nearest Neighbor KNN
Talha Khan
Încă nu există evaluări
K-Nearest Neighbor Learning
Document31 pagini
K-Nearest Neighbor Learning
Edward Kenway
Încă nu există evaluări
Algo Paper
Document5 pagini
Algo Paper
Uzman
Încă nu există evaluări
Decision Tree
Document18 pagini
Decision Tree
Mo Shah
Încă nu există evaluări
K Nearest Neighbors: Probably A Duck."
Document14 pagini
K Nearest Neighbors: Probably A Duck."
Daneil Radcliffe
Încă nu există evaluări
Accelerated Data Science Introduction To Machine Learning Algorithms
Document37 pagini
Accelerated Data Science Introduction To Machine Learning Algorithms
sanketjaiswal
Încă nu există evaluări
Lecture 4
Document31 pagini
Lecture 4
Ahmed Mosa
Încă nu există evaluări
Instance Based Learning
Document39 pagini
Instance Based Learning
Darshan R Gowda
Încă nu există evaluări
DADM S15 K-NN Classification
Document13 pagini
DADM S15 K-NN Classification
Anisha Sapra
Încă nu există evaluări
6 - KNN Classifier
Document10 pagini
6 - KNN Classifier
Carlos Lopez
Încă nu există evaluări
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
Document47 pagini
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
Vinay Kushwaha
Încă nu există evaluări
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
Document47 pagini
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
pinak mishra
Încă nu există evaluări
IME672 - Lecture 44
Document16 pagini
IME672 - Lecture 44
Himanshu Beniwal
Încă nu există evaluări
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
Document32 pagini
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
Rule2
Încă nu există evaluări
Clustering: Analisis Big Data - Pertemuan 6
Document51 pagini
Clustering: Analisis Big Data - Pertemuan 6
Nadya Novita
Încă nu există evaluări
Data Mining All Summary
Document47 pagini
Data Mining All Summary
vijayakumar2k93983
Încă nu există evaluări
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
Document13 pagini
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
「瞳」你分享
Încă nu există evaluări
TOD 212 - Digging Through Data - PPT - For Students - Monsoon 2023 (Autosaved)
Document18 pagini
TOD 212 - Digging Through Data - PPT - For Students - Monsoon 2023 (Autosaved)
dhyani.s
Încă nu există evaluări
Road Traffic Algorithm
Document5 pagini
Road Traffic Algorithm
syamprasad
Încă nu există evaluări
UCS551 Chapter 7 - Clustering
Document9 pagini
UCS551 Chapter 7 - Clustering
Farah Yahaya
Încă nu există evaluări
DM Chapter 5 (Clustering)
Document40 pagini
DM Chapter 5 (Clustering)
world channel
Încă nu există evaluări
Unit 5 - DA - Classification & Clustering
Document105 pagini
Unit 5 - DA - Classification & Clustering
881Aritra Pal
Încă nu există evaluări
Unit-6: Classification and Prediction
Document63 pagini
Unit-6: Classification and Prediction
Nayan Patel
Încă nu există evaluări
KNN VS Kmeans
Document3 pagini
KNN VS Kmeans
Soubhagya Kumar Sahoo
Încă nu există evaluări
KNN and Naive Bayes
Document61 pagini
KNN and Naive Bayes
vdjohn
Încă nu există evaluări
A Complete Guide To KNN
Document16 pagini
A Complete Guide To KNN
cinculiranje
Încă nu există evaluări
Unit 4
Document20 pagini
Unit 4
aknumsn7724
Încă nu există evaluări
K Nearest Neighbors - Classification: Algorithm
Document4 pagini
K Nearest Neighbors - Classification: Algorithm
Muthu Kumaran
Încă nu există evaluări
w5 Classification
Document34 pagini
w5 Classification
Swastik Sindhani
Încă nu există evaluări
Unit 5
Document31 pagini
Unit 5
minichel
Încă nu există evaluări
Day 2 - Session 2: - KNN - Decision Tree - Random Forest - Naïve Bayes Classification
Document50 pagini
Day 2 - Session 2: - KNN - Decision Tree - Random Forest - Naïve Bayes Classification
sartg
Încă nu există evaluări
Methods and Techniques
Document39 pagini
Methods and Techniques
Antonio Gotev
Încă nu există evaluări
Unit 5 Classification PDF
Document131 pagini
Unit 5 Classification PDF
Juee Jamsandekar
Încă nu există evaluări
Adaptive Learning-Based K-Nearest Neighbor Classifiers With Resilience To Class Imbalance
Document17 pagini
Adaptive Learning-Based K-Nearest Neighbor Classifiers With Resilience To Class Imbalance
Ayush
Încă nu există evaluări
Unit - Iii
Document52 pagini
Unit - Iii
Laxmi
Încă nu există evaluări
SUMSEM-2020-21 MEE6070 ETH VL2020210700842 Reference Material I 16-Jul-2021 K-Nearest Neighbors (KNN) Algorithm (Repaired) Week-3
Document40 pagini
SUMSEM-2020-21 MEE6070 ETH VL2020210700842 Reference Material I 16-Jul-2021 K-Nearest Neighbors (KNN) Algorithm (Repaired) Week-3
Krupa
Încă nu există evaluări
Slides Courtesy: Ling Chen lchen@L3S.de
Document42 pagini
Slides Courtesy: Ling Chen lchen@L3S.de
fero_sher22
Încă nu există evaluări
1694600817-Unit2.3 KNN CU 2.0
Document25 pagini
1694600817-Unit2.3 KNN CU 2.0
prime9316586191
Încă nu există evaluări
Wikipedia K Nearest Neighbor Algorithm
Document4 pagini
Wikipedia K Nearest Neighbor Algorithm
Radu Cimpeanu
Încă nu există evaluări
KNN ALGORITHM IN MACHINELEARNING
Document10 pagini
KNN ALGORITHM IN MACHINELEARNING
nithinmamidala999
Încă nu există evaluări
Chapter 10 - Introduction To Data Mining
Document69 pagini
Chapter 10 - Introduction To Data Mining
Pankaj Marye
Încă nu există evaluări
Mullick 2018
Document13 pagini
Mullick 2018
Emilio
Încă nu există evaluări
ML (Interview)
Document20 pagini
ML (Interview)
ratnadepp
Încă nu există evaluări
Isbm College of Engineering, Pune Department of Computer Engineering Academic Year 2021-22
Document9 pagini
Isbm College of Engineering, Pune Department of Computer Engineering Academic Year 2021-22
Vaibhav Srivastava
Încă nu există evaluări
K-Means Consistency in Clustering
Document10 pagini
K-Means Consistency in Clustering
Naveen Kumar
Încă nu există evaluări
Machine Learning
Document33 pagini
Machine Learning
shobhit
Încă nu există evaluări
ML Unit 3 Part 3
Document33 pagini
ML Unit 3 Part 3
jkdprince3
Încă nu există evaluări
Instance-Based Learning: K-Nearest Neighbour Learning
Document21 pagini
Instance-Based Learning: K-Nearest Neighbour Learning
feathh1
Încă nu există evaluări
Why Nearest Neighbor?: Used To Classify Objects Based On Closest Training Examples in The Feature Space
Document40 pagini
Why Nearest Neighbor?: Used To Classify Objects Based On Closest Training Examples in The Feature Space
UrsTruly Anirudh
Încă nu există evaluări
KMEANS
Document9 pagini
KMEANS
johnzenbano120
Încă nu există evaluări
Data Mining Intro
Document46 pagini
Data Mining Intro
Manish Sejpal
Încă nu există evaluări
Classification DecisionTreesNaiveBayeskNN
Document75 pagini
Classification DecisionTreesNaiveBayeskNN
Dev kartik Agarwal
Încă nu există evaluări
An Introduction To Data Mining
Document35 pagini
An Introduction To Data Mining
saloni_chugh
Încă nu există evaluări
Note 1518944988
Document27 pagini
Note 1518944988
Ayman Ayman
Încă nu există evaluări
A Review of Various KNN Techniques
Document6 pagini
A Review of Various KNN Techniques
IJRASETPublications
Încă nu există evaluări
Coursera 2UU67MNKNNSK PDF
Document1 pagină
Coursera 2UU67MNKNNSK PDF
Tanweer Anwar
Încă nu există evaluări
Current Booking Current Booking: Irctcs E-Ticketing Service Electronic Reservation Slip (Personal User)
Document2 pagini
Current Booking Current Booking: Irctcs E-Ticketing Service Electronic Reservation Slip (Personal User)
Tanweer Anwar
Încă nu există evaluări
Guidline Makaut Subject Selection
Document8 pagini
Guidline Makaut Subject Selection
Tanweer Anwar
Încă nu există evaluări
MANUU Degree - MANUU PDF
Document2 pagini
MANUU Degree - MANUU PDF
Tanweer Anwar
Încă nu există evaluări
Guidline Makaut Subject Selection
Document8 pagini
Guidline Makaut Subject Selection
Tanweer Anwar
Încă nu există evaluări
Irs Unit 4 CH 1
Document58 pagini
Irs Unit 4 CH 1
Zeenath
Încă nu există evaluări
Using Decision Trees in Order To Determine Intersection Design Rules
Document15 pagini
Using Decision Trees in Order To Determine Intersection Design Rules
uykuyk
Încă nu există evaluări
Scanned Document Image Segmentation Using Back-Propagation Artificial Neural Network Based Technique
Document8 pagini
Scanned Document Image Segmentation Using Back-Propagation Artificial Neural Network Based Technique
Mustafa Troy Troy
Încă nu există evaluări
Students Placement Prediction System
Document5 pagini
Students Placement Prediction System
IJRASETPublications
Încă nu există evaluări
Chap 1 Test Bank
Document29 pagini
Chap 1 Test Bank
CATHY MAE ACAS BUSTAMANTE
100% (1)
Face Recognition - Methods, Applications and Technology
Document252 pagini
Face Recognition - Methods, Applications and Technology
vbriceno1
Încă nu există evaluări
Chapter BI4
Document54 pagini
Chapter BI4
eyob yohannes
Încă nu există evaluări
Data Science Upgrad
Document13 pagini
Data Science Upgrad
Sai Kumar
Încă nu există evaluări
Hot Topics For Research Papers in Computer Science
Document8 pagini
Hot Topics For Research Papers in Computer Science
gz8jpg31
Încă nu există evaluări
Personnel Handbook V13.3
Document246 pagini
Personnel Handbook V13.3
suzette
Încă nu există evaluări
Data Mining BKTI 1
Document47 pagini
Data Mining BKTI 1
Silogism
Încă nu există evaluări
XV. Anomaly Detection
Document4 pagini
XV. Anomaly Detection
Ahsan
0% (1)
SARVAJITH M Resume Latest v2
Document3 pagini
SARVAJITH M Resume Latest v2
sarvajith
Încă nu există evaluări
Pima Indian Diabetes Prediction
Document22 pagini
Pima Indian Diabetes Prediction
ابوالحروف العربي ابوالحروف
Încă nu există evaluări
Malwarepjct PDF
Document70 pagini
Malwarepjct PDF
shyam
Încă nu există evaluări
Cognitive Workload Classification Using Cardiovascular Measures and Dynamic Features
Document6 pagini
Cognitive Workload Classification Using Cardiovascular Measures and Dynamic Features
Sidik purnomo
Încă nu există evaluări
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
Document33 pagini
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
preetam
Încă nu există evaluări
1b Different Types
Document26 pagini
1b Different Types
Vijay Vicky
Încă nu există evaluări
HW 7
Document4 pagini
HW 7
adithya604
Încă nu există evaluări
UNIT 1 - Introduction (Types of Machine Learning)
Document21 pagini
UNIT 1 - Introduction (Types of Machine Learning)
2023aa05902
Încă nu există evaluări
1 - A Survey of Intrusion Detection Models Based On NSL-KDD Data Set (IEEE)
Document6 pagini
1 - A Survey of Intrusion Detection Models Based On NSL-KDD Data Set (IEEE)
Bach Nguyen
Încă nu există evaluări
Iccci 2017 8117730
Document4 pagini
Iccci 2017 8117730
nawiliy201
Încă nu există evaluări
Synopsis New
Document5 pagini
Synopsis New
Komal
Încă nu există evaluări
Data Mining Supervised Techniques II
Document13 pagini
Data Mining Supervised Techniques II
Priyank Jain
Încă nu există evaluări
Heart Disease Prediction With Machine Learning Approaches
Document5 pagini
Heart Disease Prediction With Machine Learning Approaches
Rony saha
Încă nu există evaluări
Face Liveness Detection: Department of ECE Cmrcet
Document15 pagini
Face Liveness Detection: Department of ECE Cmrcet
Shekar Reddy
Încă nu există evaluări