Sunteți pe pagina 1din 35

LS-SVMlab & Large scale

modeling
Kristiaan Pelckmans, ESAT- SCD/SISTA
J.A.K. Suykens, B. De Moor
Content
I. Overview
II. Classification
III. Regression
IV. Unsupervised Learning
V. Time-series
VI. Conclusions and Outlooks
People
Contributors to LS-SVMlab:
Kristiaan Pelckmans
Johan Suykens
Tony Van Gestel
Jos De Brabanter
Lukas Lukas
Bart Hamers
Emmanuel Lambert

Supervisors:
Bart De Moor
Johan Suykens
Joos Vandewalle
Acknowledgements
Our research is supported by grants from several funding
agencies and sources: Research Council K.U.Leuven: Concerted
Research Action GOA-Mefisto 666 (Mathematical Engineering),
IDO (IOTA Oncology, Genetic networks), several PhD/postdoc
& fellow grants; Flemish Government: Fund for Scientific
Research FWO Flanders (several PhD/postdoc grants, projects
G.0407.02 (support vector machines), G.0080.01 (collective
intelligence), G.0256.97 (subspace), G.0115.01 (bio-i and
microarrays), G.0240.99 (multilinear algebra), G.0197.02 (power
islands), research communities ICCoS, ANMMM), AWI (Bil. Int.
Collaboration South Africa, Hungary and Poland), IWT (Soft4s
(softsensors), STWW-Genprom (gene promotor prediction),
GBOU McKnow (Knowledge management algorithms), Eureka-
Impact (MPC-control), Eureka-FLiTE (flutter modeling), several
PhD-grants); Belgian Federal Government: DWTC (IUAP IV-
02 (1996-2001) and IUAP V-10-29 (2002-2006): Dynamical
Systems and Control: Computation, Identification & Modelling),
Program Sustainable Development PODO-II (CP-TR-18:
Sustainibility effects of Traffic Management Systems); Direct
contract research: Verhaert, Electrabel, Elia, Data4s, IPCOS. JS
is a professor at K.U.Leuven Belgium and a postdoctoral
researcher with FWO Flanders. BDM and JWDW are full
professors at K.U.Leuven Belgium.

I. Overview
Goal of the Presentation
1. Overview & Intuition
2. Demonstration LS-SVMlab
3. Pinpoint research challenges
4. Preparation NIPS 2002
Research results and challenges
Towards applications
Overview LS-SVMlab

I.2 Overview research
Learning, generalization, extrapolation, identification,
smoothing, modeling

Prediction (black box modeling)
Point of view: Statistical Learning,
Machine Learning, Neural
Networks, Optimization, SVM


I.2 Type, Target, Topic
I.3 Towards applications
System identification
Financial engineering
Biomedical signal processing
Datamining
Bio-informatics
Textmining
Adaptive signal processing

I.4 LS-SVMlab
I.4 LS-SVMlab (2)
Starting points:
Modularity
Object Oriented & Functional Interface
Basic bricks for advanced research
Website and tutorial
Reproducibility (preprocessing)

II. Classification
Learn the decision function associated with a set
of labeled data points to predict the values of
unseen data

Least Squares Support Vector
Machines
Bayesian Framework
Different norms
Coding schemes
II.1 Least Squares Support vector Machines
(LS-SVM
(,)
)
1. Least Squares cost-function + regularization
& equality constraints
2. Non-linearity by Mercer kernels
3. Primal-Dual Interpretation (Lagrange
multipliers)
Primal parametric Model:
i i
T
i
e b x w y + + =
Dual non-parametric Model:
i
n
j
j i i i
e b x x K y + + =

=1
) , (
o
o

(.,.)
o
K
II.1 LS-SVM
(,)
Learning representations from relations
(
(
(
(

> < > <


> <
> < > < > <
= O
N N N N
N
a a a a
a a
a a a a a a
, ... ... ,
... ... ... ...
... ... ... ,
, ... , ,
1 2
1 2 1 1 1
II.2 Bayesian Inference
Bayes rule (MAP):
Closed form formulas
Approximations: - Hessian in optimum
- Gaussian distribution
Three levels of posteriors:
) (
) ( ) | (
) | (
X P
P X P
X P
u u
u =
) | ( : Level
) , | ( : Level
) , , | ( : Level
3
2
1
X K P
X K P
X K P
o
o
o

o
II.3 SVM formulations & norms
1 norm + inequality constraints: SVM
extensions to any convex cost-function

2 norm + equality constraints: LS-SVM
weighted versions
II.4 Coding schemes
1 2 4 6 2 1 3

1 -1 1 1
-1 -1 -1 1
1 -1 -1 -1
1 2 4 6 2 1 3
Encoding Decoding
Multi-class Classification task (multiple) binary classifiers
Labels:
III. Regression
Learn the underlying function from a set of data
points and its corresponding noisy targets in
order to predict the values of unseen data

LS-SVM
(,)

Cross-validation (CV)
Bayesian Inference
Robustness

III.1 LS-SVM
(,)
Least Squares cost-function +
Regularization & Equality constraints
Mercer kernels
Lagrange multipliers:
Primal Parametric Dual Non-parametric

III.1 LS-SVM
(,)
(2)
Regularization parameter:
Do not fit noise (overfitting)!
trade-off noise and information
e
x
x x f + + =
5
) 10 sin(
) sinc( ) (

III.2 Cross-validation (CV)


How to estimate generalization power of model?

Division training set test set

Repeated division: Leave-one-out CV (fast implementation)

L-fold cross-validation

Generalized Cross-validation (GCV):


Complexity criteria: AIC, BIC,
| |
(
(
(

=
(
(
(

N N
y
y
y
y
K X S

...

... . ) , | (
1 1
o

1 2 3t-l-1 t-lt+l t+1+l n


1 2 3 . t-2 t-1 t t+1 t+2 n
1 2 3 . t-1 t n
III.2 Cross-validation Procedure
(CVP)
How to optimize model for optimal generalization
performance

Trade-off fitting model complexity
Kernel parameters
Optimization routine?
III.1 LS-SVM
(,)
(3)
Kernel type and parameter
Zology as elephantism and non-elephantism
Model Comparison
By cross-validation or Bayesian Inference
III.3 Applications
ok, but does it work?
Soft4s
Together with O. Barrero, L. Hoegaerts,
IPCOS (ISMC), BASF, B. De Moor
Soft-sensor
ELIA
Together with O. Barrero, I.Goethals, L.
Hoegaerts, I.Markovsky, T. Van Gestel,
ELIA, B. De Moor
Prediction short and long term electricity
consumption
III.2 Bayesian Inference
Bayes rule (MAP):
Closed form formulas
Three levels of posteriors:
) (
) ( ) | (
) | (
X P
P X P
X P
u u
u =
) | ( : ) Comparison (Model Level
) , | ( : ation) (Regulariz Level
) , , | ( : ) parameters (Model Level
3
2
1
X K P
X K P
X K P
o
o
o

o
III.4 Robustness
How to build good models in the case of non-
Gaussian noise or outliers
Influence function
Breakdown point
How:
De-preciating influence of large residuals
Mean - Trimmed mean Median
Robust CV, GCV, AIC,
IV. Unsupervised Learning
Extract important features from the unlabeled data

Kernel PCA and related methods
Nystrm approximation
From Dual to primal
Fixed size LS-SVM



IV.1 Kernel PCA
Principal Component Analysis Kernel based PCA
y
x
z
IV.2 Kernel PCA (2)
Primal Dual LS-SVM style formulations
For Kernel PCA, CCA, PLS
IV.2 Nystrm approximation
Sampling of integral equation



Approximating Feature map for
Mercer kernel
) ( ) ( ) ( ) , ( y dx x p y y x K
i i
| |
o
}
=
) ( ) ( ) , (
1
y y y x K
i
N
j
i i j
| |
o
=
= ) ( ) ( ) , (
1
y y y x K
i
n
j
i i j
| |
o
=
=

~
) ( ) ( ) , ( y x y x K
T

o
=
(.)
(.)
IV.3 Fixed Size LS-SVM
i i
T
i
e b x w y + + = ) ( |

i
n
j
j i i i
e b x x K y + + =

=1
) , (
o
o
?
V. Time-series
Learn to predict future values given a sequence of
past values

NARX
Recurrent vs. feedforward
V.1 NARX
Reducible to static regression



CV and Complexity criteria
Predicting in recurrent mode
Fixed size LS-SVM (sparse
representation)

) ,..., , (
1 1 l t t t t
y y y f y

=
,.... , , , , , ...,
5 4 3 2 1 + + + + + t t t t t t
y y y y y y
f
V.1 NARX (2)
Santa Fe Time-series competition
V.2 Recurrent models?
How to learn recurrent dynamical models?
Training cost = Prediction cost?

Non-parametric model class?
Convex or non-convex?
Hyper-parameters?

) ,..., , (
2 1 l t t t t
y y y f y

=
VI.0 References
J. A. K. Suykens, T. Van Gestel, J. De
Brabanter, B. De Moor & J. Vandewalle (2002),
Least Squares Support Vector Machines, World
Scientific.
V. Vapnik (1995), The Nature of
Statistical Learning Theory, Springer-
Verlag.
B. Schlkopf & A. Smola (2002), Learning
with Kernels, MIT Press.
T. Poggio & F. Girosi (1990), ``Networks
for approximation and learning'', Proc. of
the IEEE, , 78, 1481-1497.
N. Cristianini &J. Shawe-Taylor (2000), An
I ntroduction to Support Vector Machines,
Cambridge University Press.

VI. Conclusions
Non-linear Non-parametric learning as a
generalized methodology

Non-parametric Learning
Intuition & Formulations
Hyper-parameters
LS-SVMlab
Questions?

S-ar putea să vă placă și