Fuzzy Modeling

Fuzzy modeling
By: Saeed Bahrami

Saeed.bahrami@gmail.com
1
Introduction:
Classical approach : - Low accuracy in complicated systems - Systems for which first principle and theoretical methods are not fully developed -
Solution: 1- human parallel processing neural networks 2- human reasoning and inference system fuzzy models
2
-Although neural networks have many advantages but they have three main problems: 1- data saved in some parameter which are not interpretable 2- nonlinear optimization problem 3- capturing the expert knowledge is impossible
Fuzzy models
-A mathematical model which in some way uses fuzzy sets is called fuzzy model [1] -A method for modeling complex, ill defined, and less tractable systems. if( ) then ( )
Example(mamdani):
If pressure is high, then volume is small
validity of rule
output of rule
Example(TSK): If velocity is high, then force = k *
fuzzy sets
fuzzy sets(mamdani) or functions (Takagi-Sugeno)

4
-Two different ideas are behind these modeling approaches; while the Mamdani model tries to imitate the human reasoning mechanism, the Takagi-Sugeno model tries to represent system by some local simple models when it is not describable by a single model accurately. For this reason Takagi-Sugeno model is sometimes called local model.
- input space partitioning

partitioning of input space Grid partitioning Tree partitioning Scatter partitioning
1-ANFIS (Jang) 2-FUREGA
LOLIMOT(Nelles)
CLUSTERING (Babuka)
ANFIS
(Adaptive-Network-Based Fuzzy Inference System)
Main problems of fuzzy modeling before ANFIS: 1) No standard methods exist for transforming human knowledge or experience into the rule base and database of a fuzzy inference system.
2) There is a need for effective methods for tuning the membership functions (MFs) so as to minimize the output error measure or maximize performance index.
neural networks
Neuron structure:
Output of neuron:
y k ( kj x j b k )
j 1
9
- Activation function(): The logistic function

1 ( 1+exp()
):
Hyperbolic tangent (tanh()):
Nonlinear behavior of neural networks!
10
Multilayer perceptron (MLP):
Arbitrary number of hidden layer can be used!

11
Training MLPS (back propagation)

-Training data: ( : Input to MLP , : desired output , :MLP output for ()) =
(1) ()
1 (1) & = 1 ()
(1) & ()
(1) ()
- Cost function 1 = 2

=1
&
- What should be optimized (neuron weights)
12
-Optimization algorithm steepest descent: The search direction is the opposite gradient direction.
= : the gradient of output error with respect to - The most important advantage of this algorithm is that it shows that the gradient for each weight can be calculated with the aid of the gradient of neurons in the next layer.
13
-Training procedure:
Its two pass optimization method. In forward pass the inputs go through the MLP and and = can be calculated. It backward pass the error goes through output layer to input layer and update all of the MLPs weights. This procedure repeated by all data samples many time.
14
Fuzzy Inference System (FIS)
15
Fuzzy Inference System (FIS):
1-Compare the input variables with the membership functions on the premise part to obtain the membership values (or compatibility measures) of each linguistic label. (This step is often called fuzzification ). 2- Combine (through a specific T-norm operator, usually multiplication or min.) the membership values on the premise part to get firing strength (weight) of each rule. 3- Generate the qualified consequent (either fuzzy or crisp) of each rule depending on the firing strength.
4- Aggregate the qualified consequents to produce a crisp output. (This step is called defuzzification.)
16
- Example
1 1 = + + () 2 2 2 2 = + + () 1 1
Type 1
Mamdani
Type2
TSK
17
Each of this if then rules can be represented as an adaptive network:
Nodes with adaptive parameters
Nodes fixed operation Centers and width of membership functions
& &
18
Example of a FIS with two input and three membership function for each of the inputs
19
Training procedure:
two passes in the hybrid learning procedure for ANFIS
Forward pass Premise parameters Consequent parameters signals Fixed Least square estimate Node output Backward pass Gradient descent Fixed Error rates
20
Why we can use the least squares algorithm for consequent parameters: (for example for TSK model on page 18) 1 2 = 1 + 2 1 + 2 1 + 2 = 1 1 + 2 2 = (1 ) + 1 + 1 + (2 ) + 1 + 1 Linear regression problem =
21
= (1 )
1 (2 ) 1
- In backward pass the gradient descent algorithm is used to optimize the premise parameter while the error propagate backward through the network.(like back propagation in neural networks)
22
Remark1: since the consequent parameters are optimized in each iteration with least squares algorithm, in backward pass the nonlinear optimization problem can be solved more efficiently and problems such as being trapped in local minima or slow convergence are less problematic.
23
- remark2: TSK model is more popular in ANIS structure since it has more adjustable parameters in consequent of rules. This will reduce the training time and effort, because these parameters will be linear with respect to output error and can be estimated very efficiently through least-squares algorithm
24
- Remark3: sometimes optimizing the premise parameter (input membership functions) will deteriorate the interpretability of the rule base.
25
Example:
= 0.6 sin + 0.3 sin 3 + 0.1 sin 5 & = [1,1]
3 membership function for each output(9rules)
26
27
Loss of interpretability
28
FUREGA
Fuzzy Rule Extraction using Genetic Algorithm
29
FUREGA:
1- start a grid base network using prior knowledge 2- selection of rule by genetic algorithm 3-least squares for output parameter optimization 4- constrain nonlinear optimization of membership function
30
Properties :
Hopeful to have the best solution (accuracy) Time consuming training
Curse of dimensionality
Interpretability ?
31
LOLIMOT
Local Linear Model Tree
32
What are local models ?
33
Example:
34
LOLIMOT algorithm:
-The algorithm has an outer loop (upper level) that determines the input partitions (structure) where the local linear models are valid and an inner loop (lower level) that estimates the parameters of those local linear models by efficient weighted least squares algorithm.
Consequent parameter estimation:
=1
0 + 1 1 + + . (, , )
: inputs vector
:local linear model parameters
: normalized Gaussian weighting function for the ith model with center coordinates and standard deviations
35
, , =
Where: 1 1 1 = exp( ( 2 2 1
2
=1
2
2 2 2 2
+ +
)) 2
- Assume the weighting functions would have been already
determined. Then the parameters of each linear model are estimated separately by a weighted least squares technique. With the data matrix X (inputs of model-known) the diagonal weighting matrix Q, (each entry is the weighting function value of the corresponding input data) and desired outputs y the optimal parameters of the model are: = 1
36
- Input space partitioning 1- Set the first hyper-rectangle in such a way that is contains all data points. Estimate a global linear model. 2- For all input dimensions j := l...n: 2a. Cut the hyper-rectangle into two halves along dimension j. 2b. Estimate local linear models for each half. 2c. Calculate the global approximation error (output error) for the model with this cut. 3- Determine which cut has led to the smallest approximation error.
37
4- Perform this cut. Place a weighting function within each center of both hyper-rectangles. Set standard deviations of both weighting functions proportional to the extension of the hyper-rectangle in each dimension. Apply the corresponding estimated local linear models(from 2b). 5- Calculate the local error measures J on basis of a parallel running model for each hyper-rectangle.
6-Choose the hyper-rectangle with the largest local error measure J.

7-If the global approximation error on a parallel model (output error) is too large go to step 2. 8- Convergence. Stop.
38
LOLIMOT
39
Example:
40
properties:
High interpretability of rules
Automatically partitioning of the input space
according to the system properties

Different objective function for modeling error and
structure optimization
Low sensitivity to user selected parameters No curse of dimensionality for high-dimensional
problems
41
Implementing Hierarchical Fuzzy Clustering in Fuzzy Identification Using weighted fuzzy C-means
42
Clustering
- Definition
to divide the data-set in such way that objects belonging to the same cluster are as similar as possible and objects belonging to different clusters are as dissimilar as possible - types 1- Crisp 2- Fuzzy - Properties 1-Unsupervised learning task 2- Nonlinear optimization 3- Computational economy 4- Needs user defined parameters
43
Fuzzy C_means (FCM)

Cost function
m ---> 1
clusters ---> crisp m ---> clusters ---> fuzzy Iterative training
44
Example of fuzzy C_means
45
Weighted fuzzy C-means (WFCM)

Some points are more important
46
self organizing map(SOM):

The most famous neural network base clustering K-means (crisp C-means) with sequential training
= + ( )
47
SOM algorithm:
1- Choose initial values for the C neuron vectors , = 1, . . . , . This can be done by picking randomly C different data samples. 2. Choose one sample for the data set(u). This can be done either randomly or by systematically going through the hole data set. 3. Calculate the distance of the selected data sample to all neuron vectors. Typically, the Euclidean distance measure is used. The neuron with the vector closest to the data sample is called the winner neuron. 4. Update the vector of the winner neuron in a way that moves it toward the selected data sample u: = + ( ) 5. If any neuron vector has been moved significantly, in the previous step then go to Step 2; otherwise stop.
48
fuzzy clustering for fuzzy identification

It is a unsupervised learning task so it does not need no additional
data.
Input space term-sets derived from a direct result of the clustering
process
Computational economy
49
Application of clustering in fuzzy modeling

1- applying clustering algorithms to input data only
2- applying clustering algorithms to output data only 3- applying clustering algorithms to a vector composed of input and output data.
50
FCM for input space partitioning

FCM requires a priori knowledge of the number of clusters
- determining the number of clusters in an iterative manner - using optimal fuzzy clustering methods
dependence of FCM on the initialization

- hierarchical clustering
interpretability of the final fuzzy model

- Model simplification methods
51
Algorithm:
52
Algorithm:
1- apply SOM algorithm to classify N data samples into n crisp clusters( , = 1. . ). 2- select the n cluster center( , = 1. . ) from previous step and assign a weight for each of them according to their relative cardinality. = 3-apply WFCM to classify the n cluster center ( , ) into C new clusters.
53
4- The centers of the Gaussian membership functions in premise 0f the fuzzy rules are obtained by simply projecting the final cluster centers into each axis. To calculate the respective standard deviations utilize the fuzzy covariance matrix.[5]
5- use weighted least squares to optimize the consequent parameters and steepest descent for premise parameters.(Formulas[5]) 6- merge similar member functions for interpretability. similarity measure: , =

7- optimize the consequent parameters again.

54
Example I:
55
Example :
5 4
DS1\11
SOM
1 1 2 3 DS1\10 4 5
5.5
green w=2/51 blue w=3/51 red w=4/51 dark blue w=8/51

black w=10/51
WFCM
4.5
3.5
X2
2.5 1.5 1 2
X1
56
Example (cont.):
initial term-sets for x1
1 0.8 0.6 0.4 0.2 0 1 1.5 2 2.5 3 3.5 4 4.5 5 1
initial term-sets for x2

1
J=0.1801
0.8 0.6 0.4 0.2 0 1 1.5 2 2.5 3 3.5 4 4.5 5
final term-sets for x1

1
final term-sets for x2
J=0.0018
1 1.5 2 2.5 3 3.5 4 4.5 5
0.5
0.5
1.5
2.5
3.5
4.5
simplified term-sets for x1

1
simplified term-sets for x2

1
medium
0.5
large J=0.0154
0.5
large small
small
0 0 1 1.5 2 2.5 3 3.5 4 4.5 5 1
1.5
2.5
3.5
4.5
R1: if x1 is small and x2 is small then y=17.3-2.6x1+1.4x2 R2: if x1 is medium and x2 is large then y=7.5-2.9x1-0.02x2 R3: if x1 is large and x2 is small then y=4.7+2.7x1-7.8x2 R4: if x1 is large and x2 is large then y=2.8-0.2x1-0.2x2
57
Example II:
Inputs x(t-18) x(t-12) x(t-6) x(t)

x
1.3 1.2 1.1 1 0.9 0.8
output x(t+6)
0.7 0.6 0.5 0.4 0
50
100
150
200
250
300
350
400
450
500
Example II(cont.):
initial term-sets for x(t-18)
1 0.8 0.6 0.4 0.2 0 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

1 0.8 0.6 0.4 0.2 0 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3
final term-sets for x(t-18)

1

1
0.5
0.5
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
simplified term-sets for x(t-18)

1

1
0.5
0.5
0 0.4
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
Example II(cont.):
1 0.8 0.6 0.4 0.2 0 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3
J=0.0166
initial term-sets for x(t)

1 0.8 0.6 0.4 0.2 0 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

1
J=0.0072
final term-sets for x(t)

1
0.5
0.5
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3

1
J=0.0128
simplified term-sets for x(t)

1
0.5
0.5
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
Benefits to Similar approaches::

It does not need any additional data
Low sensitivity to user selected parameters and initial condition
Computational economy
curse of dimensionality interpretability
Sensitivity to data distribution
61
universal approximator
62
Proof:[6]
63
References:
1- Babuka, R. and Verbuggen, H. (2003). Neuro-fuzzy methods for nonlinear system identification, Review. Annual reviews in control, 27, 73-85. 2- Haykin, S.(1998), Neural Networks: A Comprehensive Foundation. Prentice Hall. 4- Jang, J.-S.R. (1993). ANFIS: Adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man & Cybernetics, 23(3), 665685. 3- Nelles, O. and Isermann, R. (1996). Basis function networks for interpolation of local linear models. In: IEEE Conference on Decision and Control (CDC), 470475. 4- Nelles, O. (2002). Nonlinear System Identification. Springer Verlag, Berlin. 5- Oliveira, J. V. and Pedrycz, W. (2007). Advances in Fuzzy Clustering and its Applications, John Wiley & Sons, chapter 12. 6- Espinosa, J., Vandewalle, J., Wertz, V. (2004). Fuzzy logic, identification and predictive control. Springer Verlag, Berlin.
64
Questions and Discussion
Thanks for your attention
65

Fuzzy Modeling

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Fuzzy Modeling

Încărcat de

Drepturi de autor:

Formate disponibile

Fuzzy modeling

By: Saeed Bahrami

If pressure is high, then volume is small

Example(TSK): If velocity is high, then force = k *

fuzzy sets(mamdani) or functions (Takagi-Sugeno)

- input space partitioning

1-ANFIS (Jang) 2-FUREGA

- Activation function(): The logistic function

Hyperbolic tangent (tanh()):

Nonlinear behavior of neural networks!

Multilayer perceptron (MLP):

Arbitrary number of hidden layer can be used!

Training MLPS (back propagation)

- What should be optimized (neuron weights)

Fuzzy Inference System (FIS)

Fuzzy Inference System (FIS):

Each of this if then rules can be represented as an adaptive network:

Nodes with adaptive parameters

Nodes fixed operation Centers and width of membership functions

3 membership function for each output(9rules)

4 membership function for each output(16rules)

5 membership function for each output(25rules)

What are local models ?

:local linear model parameters

- Assume the weighting functions would have been already

6-Choose the hyper-rectangle with the largest local error measure J.

Automatically partitioning of the input space

according to the system properties

Fuzzy C_means (FCM)

clusters ---> crisp m ---> clusters ---> fuzzy Iterative training

Example of fuzzy C_means

Weighted fuzzy C-means (WFCM)

self organizing map(SOM):

fuzzy clustering for fuzzy identification

Application of clustering in fuzzy modeling

FCM for input space partitioning

dependence of FCM on the initialization

interpretability of the final fuzzy model

7- optimize the consequent parameters again.

green w=2/51 blue w=3/51 red w=4/51 dark blue w=8/51

initial term-sets for x2

0.8 0.6 0.4 0.2 0 1 1.5 2 2.5 3 3.5 4 4.5 5

final term-sets for x1

final term-sets for x2

simplified term-sets for x1

simplified term-sets for x2

Inputs x(t-18) x(t-12) x(t-6) x(t)

1.3 1.2 1.1 1 0.9 0.8

0.7 0.6 0.5 0.4 0

initial term-sets for x(t-12)

final term-sets for x(t-18)

final term-sets for x(t-12)

simplified term-sets for x(t-18)

simplified term-sets for x(t-12)

initial term-sets for x(t)

final term-sets for x(t-6)

final term-sets for x(t)

simplified term-sets for x(t-6)

simplified term-sets for x(t)

Benefits to Similar approaches::

Sensitivity to data distribution

Questions and Discussion

Thanks for your attention