Documente Academic
Documente Profesional
Documente Cultură
e
Summer
Minin
g
Data
August 2009
Summer
Cours
e 2/34
Minin
g
Data
Structure
3/34
e
Summer
Minin
U.M.Fayyad, G.Patetsky-
Shapiro and P.Smyth
(1995)
Cours
4/34
e
Summer
Minin
g
Data
Clustering Classification
Regression
X2
+
X2 + +
+
+
+ + +
+ +
+ + + - + +
++ + + + -
+ + - - +
+ ++ + +
+
- +
+
-
X1 X1 X1
k-th Nearest Neighbour Linear Discriminant Analysis, Classical Linear
Parzen Window QDA Regression
Unfolding, Conjoint
Analysis, Cat-PCA
Logistic Regression (Logit) Ridge Regression
Decision Trees, LSSVM, NN, VS NN, CART
Cours
5/34
e
Summer
Minin
g
Data
6/34
e
Summer
Minin
g
Data
7/34
e
Summer
Minin
g
Data
Given: ( x1, y1 ), , ( xm , ym ) n X 1
Find: : n 1
8/54
e
Summer
Minin
g
Data
Y 0 1 x ~ N (0, )
1 > 0 Positive Association
1 < 0 Negative Association
1 = 0 No Association
Cours
9/54
e
Summer
Minin
g
Data
10/54
e
Summer
Minin
g
Data
x x
2
S xx
sided) Coefficientsa
Confidence Intervals
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 89.124 7.048 12.646 .000
x
LSD_CONC
1 -9.009 1.503 -.937 -5.994 .002
y
a. Dependent Variable: SCORE
Cours
11/54
e
Summer
Minin
g
Data
12/54
e
Summer
Minin
g
Data
13/54
e
Summer
Minin
g
Data
y y
2
S yy
2
^
SSE
y y
Cours
14/54
e
Summer
Minin
g
Data
Y = 0 + 1x1 + + pxp +
15/54
e
Summer
Minin
g
Data
E (Y | x1 , x p ) 0 1 x1 p x p
Least Squares Fitted (predicted) equation, minimizing SSE:
2
^ ^ ^ ^
^
Y 0 1 x1 p x p SSE Y Y
Cours
16/54
e
Summer
Minin
g
Data
n 2 p
^
Y Y j
2
min SSE
Ridge regression estimation:
i 1 j 1
Cours
17/59
e
Summer
Minin
g
Data
sum(|beta|) sum(|beta|)
Cours
18/59
e
Summer
Minin
estimation:
k-NN, Decision trees, smoothers
Cours
19/59
e
Summer
Minin
estimation:
k-NN, Decision trees, smoothers
Cours
20/59
e
Summer
Minin
estimation:
k-NN, Decision trees, smoothers
Cours
21/59
e
Summer
Minin
estimation:
k-NN, Decision trees, smoothers
How to Choose k or h?
22/59
e
Summer
Minin
g
Data
middle-sized
small area
biggest area
area
Expenditures
Expenditures
Expenditures
Support
vectors
Age Age
Age
23/59
e
Summer
Minin
g
Data
Expenditures
Age
Cours
24/59
e
Summer
Minin
g
Data
Expenditures
Age
Cours
25/59
e
Summer
Minin
g
Data
How to choose , ,
= RBF kernel:
26/59
e
Summer
Minin
g
Data
0.018 0.064
0.016 0.063
0.014 0.062
gamma
CVMSE
0.061
0.012
0.06
0.01
0.0588
0.059
0.008 0.0592 0.02
0.058
0 0.01
0.006
0 5 10 15 5
10
C 0
15
C gamma
Cours
27/59
e
Summer
Minin
g
Data
SVR Study :
Model Training, Selection and Prediction
CVMSE (IR*, HR*, CR*) True returns (red) and raw predictions
(blue)
28/59
e
Summer
Minin
g
Data
-1
-2.85
-1.5
-2.9
SP500
SP500
-2
-2.95
-2.5
-3 -3
-3.5
-70 -60 -50 -40 -30 -20 -10 0 10 20 30 Effect of vix FUT on SP500
-3.05
-2-1
3moftreasure
Effect bill
vix on SP500 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
-2.5 credit spread
-3 -2.5
-3.5
-3
SP500
SP500
-4
-3.5
-4.5
-4
-5
-5.5 -4.5
-40 -30 -20 -10 0 10 20 30 40 50 60 -10 -5 0 5 10 15 20 25
vix vix FUT
Cours
29/34
e
Summer
Minin
g
Data
3 MSE=
0.04
2.5
2
0 5 10 15 20set, OLS solution
Holiday Data, test 25 30 35 40
4 Observation
OLS
3.5
Expenditure
3
MSE=
0.23
2.5
2
0 5 10 15 20 25 30 35 40
Obserlation
Cours
30/34
e
Summer
Minin
Technical Note:
g
Data
Min. number
of
training
Model
errors, test errors
complexity
31/34
e
Summer
Minin
g
Data
32/34
e
Summer
Minin
g
Data
33/34
e
Summer
Minin
g
Data
R^2-adjusted:
Summer
Cours
e 34/34
Minin
g
Data
http://www-stat.stanford.edu/~tibs/lasso.html ,
LASSO and Ridge Regression (linear Bishop, 2006
and nonlinear)
Alpaydin, 2004,
Nonparametric (local) regression
Hastie et. el., 2001
estimation:
kNN for regression, Decision trees,
Smoothers
Smola and Schoelkopf, 2003