Sunteți pe pagina 1din 19

Data Mining

Data Mining References

Jiawei Han and Micheline Kamber, Data Mining: Concepts and


Techniques, Morgan Kaufmann Publishers, Elsevier, 3 rd Edition,
2012.

Margaret H. Dunham, Data Mining: Introduction and Advanced


Topics, Pearson Education, 2006.

Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to


Data Mining , Pearson Education, 2006.

Richard O. Duda, Peter E. Hart and David G. Stork , Pattern


Classification, Wiley Publication, 2nd Edition, 2000.

Ian H. Witten, Eibe Frank and Mark A. Hall, Data Mining Practical
Machine Learning Tools and Techniques, Morgan Kaufmann
Publishers, Elsevier, 3rd Edition, 2011.

IEEE Transactions

Knowledge and Data Engineering

ACM Transactions

Information Systems
Database Systems
Internet Technology
2

Data Mining Objectives

Data Mining or

Knowledge Discovery from Data

OBJECTIVES

Understanding basic data mining concepts &


techniques:

uncovering interesting data patterns, hidden in large data


sets

Development of data mining tools:

scalable and efficient

Evolution of Sciences

Before 1600, empirical science

1600-1950s, theoretical science

Each discipline has grown a theoretical component. Theoretical models often


motivate experiments and generalize our understanding.

1950s-1990s, computational science

Over the last 50 years, most disciplines have grown a third, computational
branch (e.g. empirical, theoretical, and computational ecology, or physics, or
linguistics.)

Computational Science traditionally meant simulation. It grew out of our


inability to find closed-form solutions for complex mathematical models.

1990-now, data science

The flood of data from new scientific instruments and simulations

The ability to economically store and manage petabytes of data online

The Internet and computing Grid that makes all these archives universally
accessible

Scientific info. management, acquisition, organization, query, and


visualization tasks scale almost linearly with data volumes. Data mining is a
major new challenge!

Evolution of Database
Technology

1960s:

data creation & collection

IMS

electronic mode
hierarchical database system by IBM

network DBMS

1970s:

relational data model

relational DBMS implementation

1980s:

RDBMS

advanced data models

extended-relational, OO, deductive, etc.

application-oriented DBMS

spatial, scientific, engineering, etc.


5

Evolution of Database
Technology

1990s:

Data mining

Data warehousing

Multimedia databases

Web databases

2000s:

Stream data management and mining

Data mining and its applications

Web technology

XML

data integration

social networks

global information systems

DM Evolution

Data Mining Importance

The Explosive Growth of Data:

Terabytes (240 bytes)

Petabytes

Exabytes

Zitabytes

Drowning in DATA, but STARVING for KNOWLEDGE !

Data Tombs to Golden Nuggets

PLATO

Greek philosopher and mathematician

Necessity is the Mother of Invention

Data Mining automated analysis of massive data sets


8

Data Mining Definition

Data mining definition:

Extraction or mining of interesting (non-trivial, implicit, previously


unknown and potentially useful) patterns or knowledge from large
amounts of data stored in databases, data warehouses, or other
information repositories

Alternative names

knowledge discovery (mining) in databases (KDD)

knowledge extraction

data/pattern analysis

data archeology

data dredging

information harvesting

business intelligence etc.


9

Knowledge Discovery (KDD) Process

Data mining

Pattern Evaluation

core of knowledge discovery process (identify true interesting patterns representing knowledge)

Pattern

Data Mining
(intelligent methods applied to extract patterns)

Task-relevant Data
Transformation
(summary, aggregation etc.)
Selection
(retrieve relevant data)
Data Warehouse
Data Cleaning
(remove noise and inconsistent data)

Data Integration
(combine multiple data sources)

Databases
10

Data Mining TOOLS

EXPLORE !!!!!!!!!!!!!!

R TOOL

PYTHON TOOL

WEKA TOOL

SPSS TOOL

ORANGE TOOL

CLEMENTINE TOOL

And many more.


References: DM Papers

11

Data Mining and Business


Intelligence
Increasing potential
to support
business decisions

Decisio
n
Making
Data Presentation
Visualization Techniques

End User

Business
Analyst

Data Mining
Information Discovery

Data
Analyst

Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems

DBA

12

Data Mining: Confluence of Multiple


Disciplines
Database
Technology

Machine
Learning
Pattern
Recognition

Statistics

Data Mining

Algorithm

Visualization

Other
Disciplines

13

Why Not Traditional Data


Analysis?

Tremendous amount of data

High-dimensionality of data

Algorithms must be highly scalable to handle such as terabytes of data


Micro-array may have tens of thousands of dimensions

High complexity of data

Data streams and sensor data

Time-series data, temporal data, sequence data

Structure data, graphs, social networks and multi-linked data

Heterogeneous databases and legacy databases

Spatial, spatiotemporal, multimedia, text and Web data

Software programs, scientific simulations

New and sophisticated applications


14

Data Mining: Classification Schemes

General functionality

Descriptive data mining

Predictive data mining

Different views lead to different classifications

Data view: Kinds of data to be mined

Knowledge view: Kinds of knowledge to be


discovered

Method view: Kinds of techniques utilized

Application view: Kinds of applications adapted


15

Multi-Dimensional View of Data


Mining

Data to be mined

Knowledge to be mined

Characterization, discrimination, association,


clustering, trend/deviation, outlier analysis, etc.

Multiple/integrated functions and mining at multiple levels

classification,

Techniques utilized

Relational, data warehouse, transactional, stream, objectoriented/relational, active, spatial, time-series, text, multimedia, heterogeneous, legacy, WWW

Database-oriented, data warehouse (OLAP), machine learning,


statistics, visualization, etc.

Applications adapted

Retail, telecommunication, banking, fraud analysis, bio-data


mining, stock market analysis, text mining, Web mining, etc.
16

Data Warehousing

consolidation of data from several databases which are in turn


maintained by individual business units along with historical and
summary information

Roll-up

17

Multi-Tiered Architecture

other
sources
Operational
DBs

Metadata

Extract
Transform
Load
Refresh

Monitor
&
Integrator

Data
Warehouse

OLAP Server

Serve

Analysis
Query
Reports
Data mining

Data Marts
Data Sources

Data Storage

OLAP Engine Front-End Tools

18

Data Mining Research

Publications

Tayal, D. K., Jain, A., Arora, S. , Agarwal, S., Gupta, T. and Tyagi, N., Crime
Detection and Criminal Identification in India Using Data Mining Techniques,
Artificial Intelligence & Society (AIS), SPRINGER, vol. 30, no. 1, pp. 117-127,
Feb 2015. [Indexed: Scopus, Google Scholar, EDSCO, ACM Digital Library,
DBLP]

Jain, A. Yadav, D., and Tayal, D. K., NER for Hindi Language Using Association
Rules, International Conference on Data Mining and Intelligent Computing
(ICDMIC 2014), IGDTUW Delhi, India, IEEE, 5th-6th Sept 2014. [Indexed: Scopus]

19

S-ar putea să vă placă și