Sunteți pe pagina 1din 4

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES


Digital
Part A: Content Design
Course Title

DATA MINING

Course No(s)

IS ZC415

Credit Units
Credit Model
Content Authors

NAVNEET GOYAL

Course Objectives
No
CO1

To understand importance of Data Mining in the data-driven world

CO2

To understand the major tasks involved in Data Mining like preprocessing,


classification, clustering, and association rule mining and corresponding
algorithms

CO3

To understand how to apply Data Mining to solve real life problems

Text Book(s)
T1

Tan P. N., Steinbach M & Kumar V. Introduction to Data Mining Pearson


Education, 2006.

Reference Book(s) & other resources


R1

Han J & Kamber M, Data Mining:


Publishers, Second Edition, 2006

R2

Dunhum M.H. & Sridhar S. Data Mining-Introductory and Advanced


Topics, Pearson Education, 2006

R3

Mohammed J. Zaki & Wagner Meira Jr. "Data Mining and Analysis:
Fundamental Concepts and Algorithms" CUP, 2014

Concepts and Techniques, Morgan Kaufmann

Content Structure
1.
Data Mining overview & preliminaries
1.1.
Data Mining overview & introduction
1.2.
Major Data Mining Tasks
1.2.1.
Classification
1.2.2.
Clustering
1.2.3.
Association Rule Mining
1.3.
Some interesting case studies
1.4.
Mathematical Preliminaries
1.4.1.
Linear Algebra
1.4.2.
Probablility & Statistics
1.4.3.
Optimization
2.
Data Preprocessing
2.1.
Types of Data
2.1.1.
Categorical (Nominal & Ordinal)
2.1.2.
Numerical (Interval & Ratio)
2.2.
Data Quality
2.2.1.
Errors
2.2.2.
Data Cleansing
2.3.
Data Preprocessing
2.3.1.
Dimensionality Reduction
2.3.2.
Feature Subset Selection
2.3.3.
Discretization & Binarization
2.3.4.
Variable transformation
2.3.5.
Normalization
2.4.
Similarity & Dissimilarity Measures
2.4.1.
Similarity & Dissimilarity between Simple Attributes
2.4.2.
Dissimilarity between Data Objects
2.4.3.
Similarity between Data Objects
3.
Classification
3.1.
Introduction to Supervised Learning and Classification
3.1.1.
Preliminaries
3.1.2.
Approaches to Solving Classification Problem
3.2.
Decision Tree Induction
3.2.1.
Algorithms for Decision Tree Induction
3.2.2.
Characteristics of Decision Tree Induction
3.3.
Lazy Learners
3.3.1.
Nearest-Neighbour Classification (K-NN)
3.3.2.
Cse-based Reasoning
3.4.
Bayesian Classification
3.4.1.
Bayes' Theorem
3.4.2.
Naive Bayes Classificatin
3.4.3.
Bayesian Belief Networks
3.5.
Support Vector Machines (SVM)
3.5.1.
Maximum Margin Hyperplane
3.5.2.
Linear SVM
3.5.3.
Non-Linear SVM
3.5.4.
Characteristics of SVM
3.6.
Ensemble Classification
3.6.1.
Rationale for Ensemble Methods
3.6.2.
Bagging

3.6.3.
3.6.4.
3.7.
3.7.1.
3.7.2.
3.7.3.
3.7.4.
3.8.
3.8.1.
3.8.2.
3.8.3.
3.8.4.
4.
4.1.
4.2.
4.2.1.
4.2.2.
4.2.3.
4.2.4.
4.2.5.
4.2.6.
4.3.
4.3.1.
4.3.2.
4.3.3.
4.3.4.
4.4.
4.4.1.
4.4.2.
4.5.
4.5.1.
4.5.2.
4.5.3.
5.
5.1.
5.2.
5.2.1.
5.2.2.
5.3.
5.4.
5.4.1.
5.4.2.
6.
6.1.
6.2.
6.3.
6.4.
6.5.
6.6.
7.
7.1.
7.2.

Boosting
Random Forests
Evaluating Performance of a Classifier
Holdout Method
Random Subsampling
Cross-Validation
Bootstrapping
Class Imbalance (Rare Class Classification) & Multiclass Classification
Alternative Metrics
ROC Curves
1 vs. 1
1 vs. Rest
Clustering
Introduction Un-Supervised Learning
Partitioning Algorithms
K-means Algorithm
Issues with K-Means
Bisecting K-Means
Kernel K-Means
Expectation-Maximization Clustering
K-mediod (PAM)
Agglomerative Hierarchical Clustering (AHC)
Single-link
Complete-link
Average-link
Strengths & Weaknesses
Density-based Clustering
DBSCAN Algorithm
Strengths & Weaknesses
Cluster Validation & Assessment
External Measures
Internal Measures
Relative Measures
Association Rule Mining
Market-Basket Analysis
Frequent Itemset Mining Algorithms
Apriori Algorithm
FP-Growth Algorithm
Rule Generation
Compact Representation of Frequent Itemsets
Maximal Frequent Itemsets
Closed Frequent Itemsets
Anomaly Detection
Importance & Challenges
Statistical Approaches
Proximity-Based Outlier Detection
Density-Based Outlier Detection
Clustering-Based Outlier Detection
Classification-Based Outlier Detection
Big Data Analytics
Introduction to Big Data Analytics
Role of Data Mining in Big Data Analytics

7.3.
7.3.1.
7.3.2.
7.4.

High Performance Computing (HPC)


Cluster Computing
MapReduce/Hadoop
HPC for Data Minin