Sunteți pe pagina 1din 18

Data Mining

Spring 2015

Introduction to Data Mining


Dr. Shariq BASHIR
shariq.bashir@bui.edu.pk

Instructor:
Dr. Shariq Bashir
PostDoc: New York University Abu Dhabi
PhD: Vienna University of Technology, Austria

Faculty Room (13 XC Basement)


Tel: 051-9260002 (Ext 411)
shariq.bashir@bui.edu.pk

Student Hours
Between 11:30 AM 1:30 PM (Monday)

Yahoo Group
DataMining_BU_Spring_2015
https://groups.yahoo.com/neo/groups/DataMining_BU_Spring_2015/info

Grading Scheme

Method
Quizzes

Weight (%)
5

Assignments/Proj
ects

25

Midterm

20

Final

50

Comparison with Data Structure


Data Mining is not related to Data Structures
Data Structures is about how to store data
efficiently in storage devices (RAM, External
Memory)
But we will utilize data structures concepts
(especially linked lists, Tress, B-Tress,
Graphs) during exploring Data Mining
techniques

Comparison with DBMS


Data Mining is not DBMS
DBMS is mostly about Query Processing
SQL
In DBMS, your requirements (query) are mostly
precise, and you are mostly interested in
extracting a subset of database
e.g. show the records of all those employers
who have monthly salary > 50,000 rupees

Definition of Data Mining


Data Mining is about extraction of previously
unknown and potentially useful information
In DM, you have data but mostly you dont know
what you are trying to find
DM is not always related to big data
Queries in DM are not precise
In Style the rating of the Swift is 4/5
but then why
Value for money has rating 3/5

What is Data Mining?


Knowledge Discovery in Databases
(KDD).
Data mining digs out valuable
information from large multidimensional
apparently unrelated data bases(sets).
Its the integration of business
knowledge, statistics, computing
technology and algorithms.
Data mining is used to find hidden
patterns and relationships in data.
7

Data Mining Example


Suppose you have data (from Pakistan
Meteorological Department) of all cities of
last 10 years
Then whether calculating average
temperature of cities is a data mining task or
not?
No, this is not a data mining
task
However if you are going to
utilize this data for forecasting
temperature of Tomorrow, Next
Week or of a whole month

Then this is a data mining


Task
8

Data Mining (More Applications)


Data Mining on Weather Data
Data Mining can forecast natural hazards (like
floods, thunderstorm, hail storm, drought etc.)
Which can save thousands of lives

Data Mining Example


Road Traffic Data (Given the road traffic data of a city)
Calculating the Avg. traffic density of all roads is not
a Data Mining task
However, your task is to find which is the best route
(traffic path) from location A to location B that has low
traffic at 4:00PM then this is a data mining task

10

Data Mining Example


Collection of images
Find the two top images in a image database that
have best similarity with query image (Q).
Image Database

query image Q

O1

O3

O4

O0

O2

Top-2
images
11

11

Data Mining Example


Applications in Biometrics
You can utilize Data Mining techniques for
building efficient Biometrics applications

12

Data Mining
We will cover following techniques

Data Cleansing
Prediction/Forecasting Techniques
Clustering (grouping) similar samples
Ranking of Knowledge (Information Retrieval)
Outlier (noise) removal
Frequent Itemsets Mining

Data could be anything


Relational tables, Web (text) documents,
Images/Videos, Signal of sensors

13

Data Mining Process


Data mining: the core
of knowledge discovery
process.

Data Mining

Data Mining
Task-relevant Data
Data
Warehouse
Data Cleaning
Data Integration

Prediction (Classification) Example


Classification
Algorithms

Known
Data

age
<=30
<=30
3140
>40
>40
>40
3140
<=30
<=30
>40
<=30
3140
3140
>40

income student credit_rating


high
no fair
high
no excellent
high
no fair
medium
no fair
low
yes fair
low
yes excellent
low
yes excellent
medium
no fair
low
yes fair
medium
yes fair
medium
yes excellent
medium
no excellent
high
yes fair
medium
no excellent

age?

<=30

student?

overcast
30..40

yes

>40

credit rating?

no

yes

excellent

fair

no

yes

no

yes

Each Leaf node represents a class.

15

Ranking of Knowledge
(Information Retrieval)
Goal: Rank the knowledge most relevant to
the user Query
Dealing with notions of:
Collection of information (documents, images,
videos, voice, etc)
Query (Users information need)

16

Ranking of Knowledge
(Information Retrieval)

Data

Query
String

IR
System

Ranked
Documents

1. Doc1
2. Doc2
3. Doc3
.
.
17

Reference Books
Books
1. Jiawei Han and Micheline Kamber. Data
Mining: Concepts and Techniques. Third Edition,
Morgan Kaufmann, 2011.
Chapter1, Chapter2, Chapter3, Chapter6,
Chapter8, Chapter10, Chapter12
2. Christopher D. Manning,Prabhakar
RaghavanandHinrich Schtze,Introduction to
Information Retrieval, Cambridge University
Press. 200
Chapter1, Chapter2, Chapter3

http://nlp.stanford.edu/IR-book/

S-ar putea să vă placă și