Sunteți pe pagina 1din 65

Fuzzy modeling

By: Saeed Bahrami


Saeed.bahrami@gmail.com
1

Introduction:
Classical approach : - Low accuracy in complicated systems - Systems for which first principle and theoretical methods are not fully developed -

Solution: 1- human parallel processing neural networks 2- human reasoning and inference system fuzzy models
2

-Although neural networks have many advantages but they have three main problems: 1- data saved in some parameter which are not interpretable 2- nonlinear optimization problem 3- capturing the expert knowledge is impossible

Fuzzy models
-A mathematical model which in some way uses fuzzy sets is called fuzzy model [1] -A method for modeling complex, ill defined, and less tractable systems. if( ) then ( )
Example(mamdani):

If pressure is high, then volume is small

validity of rule

output of rule

Example(TSK): If velocity is high, then force = k *

fuzzy sets

fuzzy sets(mamdani) or functions (Takagi-Sugeno)


4

-Two different ideas are behind these modeling approaches; while the Mamdani model tries to imitate the human reasoning mechanism, the Takagi-Sugeno model tries to represent system by some local simple models when it is not describable by a single model accurately. For this reason Takagi-Sugeno model is sometimes called local model.

- input space partitioning


partitioning of input space Grid partitioning Tree partitioning Scatter partitioning

1-ANFIS (Jang) 2-FUREGA

LOLIMOT(Nelles)

CLUSTERING (Babuka)

ANFIS
(Adaptive-Network-Based Fuzzy Inference System)

Main problems of fuzzy modeling before ANFIS: 1) No standard methods exist for transforming human knowledge or experience into the rule base and database of a fuzzy inference system.
2) There is a need for effective methods for tuning the membership functions (MFs) so as to minimize the output error measure or maximize performance index.

neural networks
Neuron structure:

Output of neuron:

y k ( kj x j b k )
j 1
9

- Activation function(): The logistic function


1 ( 1+exp()

):

Hyperbolic tangent (tanh()):

Nonlinear behavior of neural networks!

10

Multilayer perceptron (MLP):

Arbitrary number of hidden layer can be used!


11

Training MLPS (back propagation)


-Training data: ( : Input to MLP , : desired output , :MLP output for ()) =
(1) ()

1 (1) & = 1 ()

(1) & ()

(1) ()

- Cost function 1 = 2


=1

&

- What should be optimized (neuron weights)

12

-Optimization algorithm steepest descent: The search direction is the opposite gradient direction.

= : the gradient of output error with respect to - The most important advantage of this algorithm is that it shows that the gradient for each weight can be calculated with the aid of the gradient of neurons in the next layer.
13

-Training procedure:
Its two pass optimization method. In forward pass the inputs go through the MLP and and = can be calculated. It backward pass the error goes through output layer to input layer and update all of the MLPs weights. This procedure repeated by all data samples many time.

14

Fuzzy Inference System (FIS)

15

Fuzzy Inference System (FIS):

1-Compare the input variables with the membership functions on the premise part to obtain the membership values (or compatibility measures) of each linguistic label. (This step is often called fuzzification ). 2- Combine (through a specific T-norm operator, usually multiplication or min.) the membership values on the premise part to get firing strength (weight) of each rule. 3- Generate the qualified consequent (either fuzzy or crisp) of each rule depending on the firing strength.
4- Aggregate the qualified consequents to produce a crisp output. (This step is called defuzzification.)
16

- Example

1 1 = + + () 2 2 2 2 = + + () 1 1

Type 1

Mamdani

Type2

TSK

17

Each of this if then rules can be represented as an adaptive network:

Nodes with adaptive parameters

Nodes fixed operation Centers and width of membership functions

& &
18

Example of a FIS with two input and three membership function for each of the inputs

19

Training procedure:
two passes in the hybrid learning procedure for ANFIS
Forward pass Premise parameters Consequent parameters signals Fixed Least square estimate Node output Backward pass Gradient descent Fixed Error rates

20

Why we can use the least squares algorithm for consequent parameters: (for example for TSK model on page 18) 1 2 = 1 + 2 1 + 2 1 + 2 = 1 1 + 2 2 = (1 ) + 1 + 1 + (2 ) + 1 + 1 Linear regression problem =
21

= (1 )

1 (2 ) 1

- In backward pass the gradient descent algorithm is used to optimize the premise parameter while the error propagate backward through the network.(like back propagation in neural networks)

22

Remark1: since the consequent parameters are optimized in each iteration with least squares algorithm, in backward pass the nonlinear optimization problem can be solved more efficiently and problems such as being trapped in local minima or slow convergence are less problematic.

23

- remark2: TSK model is more popular in ANIS structure since it has more adjustable parameters in consequent of rules. This will reduce the training time and effort, because these parameters will be linear with respect to output error and can be estimated very efficiently through least-squares algorithm

24

- Remark3: sometimes optimizing the premise parameter (input membership functions) will deteriorate the interpretability of the rule base.

25

Example:
= 0.6 sin + 0.3 sin 3 + 0.1 sin 5 & = [1,1]

3 membership function for each output(9rules)

26

4 membership function for each output(16rules)

27

5 membership function for each output(25rules)

Loss of interpretability

28

FUREGA
Fuzzy Rule Extraction using Genetic Algorithm

29

FUREGA:
1- start a grid base network using prior knowledge 2- selection of rule by genetic algorithm 3-least squares for output parameter optimization 4- constrain nonlinear optimization of membership function

30

Properties :
Hopeful to have the best solution (accuracy) Time consuming training

Curse of dimensionality
Interpretability ?

31

LOLIMOT
Local Linear Model Tree

32

What are local models ?

33

Example:

34

LOLIMOT algorithm:
-The algorithm has an outer loop (upper level) that determines the input partitions (structure) where the local linear models are valid and an inner loop (lower level) that estimates the parameters of those local linear models by efficient weighted least squares algorithm.
Consequent parameter estimation:

=1

0 + 1 1 + + . (, , )
: inputs vector

:local linear model parameters

: normalized Gaussian weighting function for the ith model with center coordinates and standard deviations
35

, , =
Where: 1 1 1 = exp( ( 2 2 1
2

=1
2

2 2 2 2

+ +

)) 2

- Assume the weighting functions would have been already

determined. Then the parameters of each linear model are estimated separately by a weighted least squares technique. With the data matrix X (inputs of model-known) the diagonal weighting matrix Q, (each entry is the weighting function value of the corresponding input data) and desired outputs y the optimal parameters of the model are: = 1
36

- Input space partitioning 1- Set the first hyper-rectangle in such a way that is contains all data points. Estimate a global linear model. 2- For all input dimensions j := l...n: 2a. Cut the hyper-rectangle into two halves along dimension j. 2b. Estimate local linear models for each half. 2c. Calculate the global approximation error (output error) for the model with this cut. 3- Determine which cut has led to the smallest approximation error.
37

4- Perform this cut. Place a weighting function within each center of both hyper-rectangles. Set standard deviations of both weighting functions proportional to the extension of the hyper-rectangle in each dimension. Apply the corresponding estimated local linear models(from 2b). 5- Calculate the local error measures J on basis of a parallel running model for each hyper-rectangle.

6-Choose the hyper-rectangle with the largest local error measure J.


7-If the global approximation error on a parallel model (output error) is too large go to step 2. 8- Convergence. Stop.
38

LOLIMOT

39

Example:

40

properties:
High interpretability of rules

Automatically partitioning of the input space

according to the system properties


Different objective function for modeling error and

structure optimization
Low sensitivity to user selected parameters No curse of dimensionality for high-dimensional

problems

41

Implementing Hierarchical Fuzzy Clustering in Fuzzy Identification Using weighted fuzzy C-means

42

Clustering
- Definition
to divide the data-set in such way that objects belonging to the same cluster are as similar as possible and objects belonging to different clusters are as dissimilar as possible - types 1- Crisp 2- Fuzzy - Properties 1-Unsupervised learning task 2- Nonlinear optimization 3- Computational economy 4- Needs user defined parameters

43

Fuzzy C_means (FCM)


Cost function

m ---> 1

clusters ---> crisp m ---> clusters ---> fuzzy Iterative training

44

Example of fuzzy C_means

45

Weighted fuzzy C-means (WFCM)


Some points are more important

46

self organizing map(SOM):


The most famous neural network base clustering K-means (crisp C-means) with sequential training
= + ( )

47

SOM algorithm:
1- Choose initial values for the C neuron vectors , = 1, . . . , . This can be done by picking randomly C different data samples. 2. Choose one sample for the data set(u). This can be done either randomly or by systematically going through the hole data set. 3. Calculate the distance of the selected data sample to all neuron vectors. Typically, the Euclidean distance measure is used. The neuron with the vector closest to the data sample is called the winner neuron. 4. Update the vector of the winner neuron in a way that moves it toward the selected data sample u: = + ( ) 5. If any neuron vector has been moved significantly, in the previous step then go to Step 2; otherwise stop.
48

fuzzy clustering for fuzzy identification


It is a unsupervised learning task so it does not need no additional

data.
Input space term-sets derived from a direct result of the clustering

process
Computational economy

49

Application of clustering in fuzzy modeling


1- applying clustering algorithms to input data only
2- applying clustering algorithms to output data only 3- applying clustering algorithms to a vector composed of input and output data.

50

FCM for input space partitioning


FCM requires a priori knowledge of the number of clusters
- determining the number of clusters in an iterative manner - using optimal fuzzy clustering methods

dependence of FCM on the initialization


- hierarchical clustering

interpretability of the final fuzzy model


- Model simplification methods
51

Algorithm:

52

Algorithm:
1- apply SOM algorithm to classify N data samples into n crisp clusters( , = 1. . ). 2- select the n cluster center( , = 1. . ) from previous step and assign a weight for each of them according to their relative cardinality. = 3-apply WFCM to classify the n cluster center ( , ) into C new clusters.
53

4- The centers of the Gaussian membership functions in premise 0f the fuzzy rules are obtained by simply projecting the final cluster centers into each axis. To calculate the respective standard deviations utilize the fuzzy covariance matrix.[5]
5- use weighted least squares to optimize the consequent parameters and steepest descent for premise parameters.(Formulas[5]) 6- merge similar member functions for interpretability. similarity measure: , =

7- optimize the consequent parameters again.


54

Example I:

55

Example :
5 4
DS1\11

SOM

1 1 2 3 DS1\10 4 5

5.5

green w=2/51 blue w=3/51 red w=4/51 dark blue w=8/51


black w=10/51

WFCM

4.5

3.5

X2
2.5 1.5 1 2

X1

56

Example (cont.):
initial term-sets for x1
1 0.8 0.6 0.4 0.2 0 1 1.5 2 2.5 3 3.5 4 4.5 5 1

initial term-sets for x2


1

J=0.1801

0.8 0.6 0.4 0.2 0 1 1.5 2 2.5 3 3.5 4 4.5 5

final term-sets for x1


1

final term-sets for x2

J=0.0018
1 1.5 2 2.5 3 3.5 4 4.5 5

0.5

0.5

1.5

2.5

3.5

4.5

simplified term-sets for x1


1

simplified term-sets for x2


1

medium
0.5

large J=0.0154

0.5

large small

small
0 0 1 1.5 2 2.5 3 3.5 4 4.5 5 1

1.5

2.5

3.5

4.5

R1: if x1 is small and x2 is small then y=17.3-2.6x1+1.4x2 R2: if x1 is medium and x2 is large then y=7.5-2.9x1-0.02x2 R3: if x1 is large and x2 is small then y=4.7+2.7x1-7.8x2 R4: if x1 is large and x2 is large then y=2.8-0.2x1-0.2x2

57

Example II:

Inputs x(t-18) x(t-12) x(t-6) x(t)


x

1.3 1.2 1.1 1 0.9 0.8

output x(t+6)

0.7 0.6 0.5 0.4 0

50

100

150

200

250

300

350

400

450

500

Example II(cont.):
initial term-sets for x(t-18)
1 0.8 0.6 0.4 0.2 0 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

initial term-sets for x(t-12)


1 0.8 0.6 0.4 0.2 0 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

final term-sets for x(t-18)


1

final term-sets for x(t-12)


1

0.5

0.5

0 0.4

0.5

0.6

0.7

0.8

0.9

1.1

1.2

1.3

0 0.4

0.5

0.6

0.7

0.8

0.9

1.1

1.2

1.3

simplified term-sets for x(t-18)


1

simplified term-sets for x(t-12)


1

0.5

0.5

0 0.4

0 0.4

0.5

0.6

0.7

0.8

0.9

1.1

1.2

1.3

0.5

0.6

0.7

0.8

0.9

1.1

1.2

1.3

Example II(cont.):
initial term-sets for x(t-6)
1 0.8 0.6 0.4 0.2 0 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

J=0.0166

initial term-sets for x(t)


1 0.8 0.6 0.4 0.2 0 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

final term-sets for x(t-6)


1

J=0.0072

final term-sets for x(t)


1

0.5

0.5

0 0.4

0.5

0.6

0.7

0.8

0.9

1.1

1.2

1.3

0 0.4

0.5

0.6

0.7

0.8

0.9

1.1

1.2

1.3

simplified term-sets for x(t-6)


1

J=0.0128

simplified term-sets for x(t)


1

0.5

0.5

0 0.4

0.5

0.6

0.7

0.8

0.9

1.1

1.2

1.3

0 0.4

0.5

0.6

0.7

0.8

0.9

1.1

1.2

1.3

Benefits to Similar approaches::


It does not need any additional data
Low sensitivity to user selected parameters and initial condition

Computational economy
curse of dimensionality interpretability

Sensitivity to data distribution

61

universal approximator

62

Proof:[6]

63

References:
1- Babuka, R. and Verbuggen, H. (2003). Neuro-fuzzy methods for nonlinear system identification, Review. Annual reviews in control, 27, 73-85. 2- Haykin, S.(1998), Neural Networks: A Comprehensive Foundation. Prentice Hall. 4- Jang, J.-S.R. (1993). ANFIS: Adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man & Cybernetics, 23(3), 665685. 3- Nelles, O. and Isermann, R. (1996). Basis function networks for interpolation of local linear models. In: IEEE Conference on Decision and Control (CDC), 470475. 4- Nelles, O. (2002). Nonlinear System Identification. Springer Verlag, Berlin. 5- Oliveira, J. V. and Pedrycz, W. (2007). Advances in Fuzzy Clustering and its Applications, John Wiley & Sons, chapter 12. 6- Espinosa, J., Vandewalle, J., Wertz, V. (2004). Fuzzy logic, identification and predictive control. Springer Verlag, Berlin.
64

Questions and Discussion

Thanks for your attention

65

S-ar putea să vă placă și