Documente Academic
Documente Profesional
Documente Cultură
Agenda:
1) Go over information on course web page
2) Lecture over chapter 1
3) Discuss necessary software
4) Take pictures
1
Statistics 202: Statistical Aspects of Data Mining
www.stats202.com
www.davemease.com
Chapter 1: Introduction
3
What is Data Mining?
4
In class exercise #1:
Find a different definition of data mining online.
How does it compare to the one in the text on the
previous slide?
5
Data Mining Examples and Non-Examples
8
In class exercise #2:
Give an example of something you did yesterday or
today which resulted in data which could potentially
be mined to discover useful information.
9
Origins of Data Mining (page 6)
zDraws ideas from machine learning, AI, pattern
recognition and statistics
zTraditional techniques
may be unsuitable due to
–Enormity of data AI/Machine
Statistics Learning/
–High dimensionality Pattern
Recognition
of data
–Heterogeneous, Data Mining
distributed nature
of data
10
2 Types of Data Mining Tasks (page 7)
zPrediction Methods:
Use some variables to predict unknown or
future values of other variables.
zDescription Methods:
Find human-interpretable patterns that
describe the data.
11
Examples of Data Mining Tasks
zClassification
[Predictive] (Chapters 4,5)
zRegression [Predictive] (covered in stats classes)
12
Software We Will Use:
zMicrosoft Excel
zR
–Can be downloaded from
http://cran.r-project.org/ for Windows, Mac or
Linux
13
Downloading R for Windows:
14
Downloading R for Windows:
15
Downloading R for Windows:
16
Pictures:
17