Documente Academic
Documente Profesional
Documente Cultură
FINAL REVIEW
Reza Zafarani (reza@data.syr.edu)
DATA MINING
Given lots of data, the data mining process discovers
patterns and models that are:
2
THIS CLASS: CIS787
This class overlaps with machine learning, statistics,
artificial intelligence, databases but more stress on
Fundamental Data Mining Algorithms
Scalability (big data)
Statistics Machine
Algorithms
Hands-on Experience Learning
Go to www.awseducate.com
Data Mining
Database
systems
3
WHAT WE HAVE COVERED
Data Types/Data Linear/Mulitivariate/Logistic/Lasso Regression Rank of a matrix
Preprocessing Logistic Regression SVD/Dimensionality Reduction with
Sampling/Stratified Bagging/Bootstrap SVD
Sampling Samples/Boosting/AdaBoost SVD and Eigen Decomposition
Curse of Partitional Clustering/ K-means/Objective CUR decomposition
Dimensionality Function/Centroids Power method
PCA Bisecting K-means MapReduce/Mapper/Reducer/
Discretization Hierarchical Clustering (Min,Max, group Combiner
Decision Trees average, centroid, Ward) -> Dendrogram Samping a fixed proportion
Hunt’s/ID3/C4.5/Ob Inversion/Globular Clusters/Chain Effect Reservoir Sampling
lique Trees Lance-Williams Formula Bloom Filter
Gini/Entropy Minimum Spanning Tree Divisive Clustering Flajolet Martin
Evaluation SSE/Cohesion/Separation/Silhouette Index
(Accuracy/Recall/F- AMS method for computing
Measure/AUC) DBScan/Chameleon moments
KNN/Naïve Bayes Frequent Itemsets Shingling/Minhashing/LSH
Support Vectors / SVM Support/Confidence/Lift
DTW Apriori
Maximal/Closed Itemsets
4
WHAT WE HAVEN’T COVERED
Data Preprocessing Other clustering Semi-Supervised Learning Spatial Data
Wavelet Transforms methods
Co-training/Self-training Trajectory Mining
(Haar) CLARANS/DENCLUE
Active Learning Graph Mining
* MDS / /NMF/BIRCH/CURE
/CLIQUE/PROCLUS/ Ensembles Graph Isomorphism
Multidimensional
Scaling ORCLUS Random Forest MCGs
* Embedding Classification Clustering data streams Web Mining
Techniques Fisher LDA STREAM/ClusStream Collaborative Filtering
ISOMAP! Rule Induction Text Mining SimRank/TrustRank
MLLE Bayesian Networks Co-clustering PageRank/HITS
Similarity-based Kernel Methods PLSA Social Network Analysis
methods Neural Networks Community Detection
Rocchio Method
Similarity between MLP/Perceptron/ Collective
Topic Models
graphs Hebbian Learning Classification
Discrete Sequences
Other Association Rule Privacy Preserving
Mining methods HMMs
Algorithms
* FP-Tree/FP-Growth Prob. Suffix Trees
K-anonymity /
Samarati’s method
5
WHAT CAN I READ TO KNOW
MORE
1. Han, Jiawei, Micheline Kamber, and Jian Pei. Data mining: concepts and
techniques: concepts and techniques. Elsevier, 2011.
2. Aggarwal, Charu C. Data Mining: The Textbook. Springer, 2015.
3. Zaki, Mohammed J., and Wagner Meira Jr. Data mining and analysis:
fundamental concepts and algorithms. Cambridge University Press, 2014.
4. Quinlan, J. Ross. C4. 5: programs for machine learning. Elsevier, 2014.
5. Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of
statistical learning. Vol. 1. Springer, Berlin: Springer series in statistics, 2001.
6. Bishop, Christopher M. Pattern recognition and machine learning. springer,
2006.
7. Schölkopf, Bernhard, and Alexander J. Smola. Learning with kernels: Support
vector machines, regularization, optimization, and beyond. MIT press, 2002.
8. Koller, Daphne, and Nir Friedman. Probabilistic graphical models: principles
and techniques. MIT press, 2009.
ENGINEERING AND COMPUTER SCIENCE | SYRACUSE UNIVERSITY 6
WE DON’T NEED NO BOOKS!
https://www.youtube.com/playlist?list=PLD63A
284B7615313A
KDD Cup
http://www.kdd.org/kdd-cup
Participate in KDD Cups!
CIKM Cup
10
FINALLY!
• Spent time answering questions, proving results, and preparing for quizzes
• Have implemented a number of methods
• And did great on the 3rd exam!