Sunteți pe pagina 1din 79

' System Identification

$ ' $
sc4110

X. Bombois
P.M.J. Van den Hof

Material: Part I: Introduction to system identification

• Lecture notes sc4110 - January 2006


available through: Blackboard or Nextprint

& % & %
Lecture hours: (see schedule for details)
• Monday (15:45-17:30) in Room D (3Me)
• Thursday (15:45-17:30) in Room D (3Me)

' $ ' $
• Friday (13:45-15:30) in Room A (API)

System Identification: Part I 1 System Identification: Part I 2

Notion of model common in many branches of science

Within (systems and control) engineering:


models of dynamical systems for the purpose of

System identification is about modeling • system (re)design


• control design
• prediction

& % & %
• simulation
• diagnosis / fault detection

System Identification: Part I 3 System Identification: Part I 4


' $ ' $
Data-based modeling of the DCSC DC motor

System identification is about data-based modeling motor (measured )


(applied ) rotational speed
voltage [V] dynamics [rad /s]

?
data-based modeling ???
Determine a model of the dynamical relation existing between

& % & %
the voltage u(t) driving the motor and the angular speed y(t)
of the rotor

'
System Identification: Part I

How to proceed?
$ '
5 System Identification: Part I

• Given a candidate model (i.e. a transfer function), we can use


the available data to compute the signal ε(t) featuring the
$
6

• Excite the system by applying the following sequence for the modeling error
voltage u(t) during 20 seconds

6
?
4

2
motor y(t)
applied u(t) u(t)
voltage [V]

dynamics
−2

−4

−6
0 2 4 6 8 10
time (sec)
12 14 16 18 20
y(t)

• Measure the induced rotational speed y(t) +


model ε(t)

& % & %
-
150

100

50
speed [rad/s]

measured y(t) −50


0

• determine that model minimizing the power of ε(t) (often a


−100

−150
filtered ε(t); see later)
0 2 4 6 8 10 12 14 16 18 20
time (sec)

System Identification: Part I 7 System Identification: Part I 8


' $ ' $
150

Identification result: a discrete-time transfer function (4th order)


100

50
Frequency response of the identified model
From: u1 To: y1

Measured y(t) (blue)


2
10

vs. ǫ(t) (red)


−50
1
10

Frequency
Magnitude (abs)

−100

response 0
10 −150
0 2 4 6 8 10 12 14 16 18 20

& % & %
−1
10 −1
10
0
10 10
1
10
2 3
10
ε(t) contains not only the model inaccuracy, but also the noise
Frequency (rad/sec)
acting on the system

'
System Identification: Part I

$ '
9 System Identification: Part I

Example 1: control of the pick-up mechanism of a CD-player


$
10

pick up mechanism: position the reading tool (laser) on the


Why is data-based modeling useful ?
right track of the CD using a mechanical arm

When thinking of modeling, we indeed generally think of


first-principle modeling and not data-based modeling

first-principle modeling = modeling using the laws of physics


(Newton, mass conservation.. )

& % & %
However, data-based modeling is often as important as
first-principle modeling Arm is driven by the current i(t) of a motor

Optical sensor to measure the laser position θ(t)

System Identification: Part I 11 System Identification: Part I 12


' $ ' $
positioning First-principle modeling
Dynamical system i(t) θ (t)
mechanism

The model is designed based on the Newton law

Objective: design a fast and precise position controller (required


bandwidth ≈ 1000 Hz) Since the current induces a force, the relation between the
current and the position is modeled by a double integrator

θref (t)
i(t) positioning θ (t)
controller

& % & %
mechanism
=⇒ model is needed The controller designed with this physical model could not
achieve the desired bandwidth without thrilling

'
System Identification: Part I

Data-based modeling
$ '
13 System Identification: Part I

$
14

An identification experiment was then performed and the


following model identified These flexible modes are quasi impossible to model with
physical laws

double integrator Identified model =⇒ new controller design

Since all significant dynamics were now tackled, the controller


flexibility modes
based on the identified model showed satisfactory behaviour

& % & %
(actuator dynamics)

For a bandwidth of ≈ 1000 Hz, the mechanical modes can no


longer be neglected and should be tackled by the controller

System Identification: Part I 15 System Identification: Part I 16


' Example 2: fatigue load reduction for new generation of wind
turbines J.W. Wingerden et al., Wind Energy, Wiley, 2008
$ ' To enhance the life duration of wind turbines, these vibrations
must be regulated
$
Two control loops to reduce the strain in the blade:
• pitch control: optimal orientation of the blades
• flap control: optimal orientation of flaps added to the blade
blade structure

blade

& The blades of a wind turbine are subject to high vibration loads

% & %
' $ ' $
due to wind gust, periodic rotations, ... flap

System Identification: Part I 17 System Identification: Part I 18

First-principle modeling

For control design, we need a model of the dynamics between Model based on aerodynamic and mechanical laws
the pitch and flap actuators and the strain in the blade: linear model (order = 28)
many physical parameters to determine =⇒ high uncertainty

wind 102
From Pitch Actuator [deg] From Smart actuator 1&2 [V]

Strain sensor Amplitude [V]


10
100

0.025
10−2

10−4

pitch actuator 100 4 101 102 100 4 101 102

20 20

Strain sensor Phase [deg]


system strain 180

flap actuator

& % & %
-360

-540
100 4 101 102 100 4 101 102
Frequency [Hz] Frequency [Hz]

With this model, impossible to deduce a controller stabilizing


the real-life system

System Identification: Part I 19 System Identification: Part I 20


' Data-based modeling
$ ' Important differences between the two models
$
We excite both inputs up to 100Hz (important band for
control) and measure the corresponding strain Behaviour in low frequencies (the physical model did not take
into account the strain sensor dynamics)
Based on these data, the following model is identified
Extra resonance between 10 Hz and 100 Hz due to other
From Pitch Actuator [deg] From Smart actuator 1&2 [V]
vibration modes (unmodeled in the first-principle approach)
Strain sensor Amplitude [V]

2
10
1st flapping mode 2st flapping mode

100

10−2 1st flapping mode


The identified model is simpler (order = 10) and less uncertaina
2st flapping mode

1st lead-lag mode


10−4

100 1P 3P 102 100 1P 3P 102

control design based on the identified model leads to a

& % & %
Strain sensor Phase [deg]

satisfactory reduction of the strain in the blade


-360

a parameters of first-principle model have in fact been tuned with the identified
-720

100 101 102 100 101 102 model

' $ ' $
Frequency [Hz] Frequency [Hz]

System Identification: Part I 21 System Identification: Part I 22

Example 3: Signal equalization in mobile telephony

clouds
mobile phone receiving
a signal y(t) A model of the so-called channel is required to reconstruct u(t)
from the distorted y(t)

Antenna emitting ground This model can not be determined in advance since the position
a signal u(t)
of the mobile phone is mobile (by definition)
The received signal y(t) is made up of several delayed versions
of the emitted signal u(t) + noise The model is identified at each received call

& % & %
y(t) = g1 u(t − n1 ) + g2 u(t − n2 ) + ... + noise

=⇒ distorted signal

System Identification: Part I 23 System Identification: Part I 24


' How to proceed?
$ ' Both the known sequence and the signal of interest are
$
distorted by the channel
When emitting u(t), the signal of interest uinterest (t) is
2

preceded by a known sequence uknown (t) 1.5

2 0.5

y(t)
1.5 0

1 -0.5

0.5 -1
u(t)

0 -1.5
known sequence signal of interest
distorted by channel distorted by channel
-0.5 -2
0 200 400 600 800 1000 1200 1400

& % & %
tim e

-1

-1.5
known sequence signal of interest
-2
0 200 400 600
tim e
800 1000 1200
Denote by yknown (t) and yinterest (t) the received signals
corresponding to uknown (t) and uinterest (t), respectively

'
System Identification: Part I

$ '
25 System Identification: Part I

Summary: First-principle vs. Data-based modeling”


$
26

the two methodologies are often combined to increase


confidence in the model
General disadvantages of first-principle modeling
Since uknown is a known sequence, the GSM software uses the
data uknown and yknown to identify a model of the channel • model contains many unknown (physical) parameters =⇒
high uncertainty (not quantifiable)

This model can be then used to determine an appropriate filter • model generally more complicate than with system
to reconstruct uinterest (t) from yinterest (t) identification
• missing actuator/sensor dynamics and phenomena can be

& % & %
forgotten
• sometimes impossible to determine (as in example 3, but also
in the process industry)
• no disturbance model

System Identification: Part I 27 System Identification: Part I 28


' $ ' the signal v(t) is an unknown disturbance (noise, process
disturbance, effects of non-measured inputs, ..)
$
System Identification: the players

v It can be best modeled via a (zero-mean) stochastic process.


to-be-identified system u + y Indeed, v(t) will never be the same if you repeat the experiment
G0 +

The challenging nature of system identification is due to the


u(t) is the (discrete-time) input which can be freely chosen presence of v(t)
y(t) is the (discrete-time) output which can be measured and is
made up of If v(t) = 0, it is just an algebraic game to find the relation

& % & %
• a contribution due to u(t) i.e. G0 u(t) between u(t) and y(t)

• a contribution independent of u(t) i.e. the disturbance v(t)


As result, an identification experiment (generally) delivers both

' $ ' $
a model of the transfer G0 and of the disturbance v(t)

System Identification: Part I 29 System Identification: Part I 30

System identification procedure


prior knowledge /
intended model application
Identification Criterion

Experiment
design Measure the “distance” between a data set (u, y)t=1,···N and a
particular model.

Identification
Data Model Set
Criterion In this course, we will consider two criteria

Construct model • Prediction Error Identification (PEI) delivering a

& % & %
discrete-time transfer function as model of G0

Validate
NOT OK • Empirical Transfer Function Estimate (ETFE) delivering an
intended
Model
model application estimate of the frequency response of G0

OK

System Identification: Part I 31 System Identification: Part I 32


' $ ' $
Why those?
• PEI is the most used method in practice and the one
delivering the most tools to validate a model Model set
• ETFE is used to have a first idea of the system and facilitate
the use of PEI Complexity of models (order, number of parameters) to be
determined
Other criteria: subspace identification, IV methods, ML

& % & %
methods, ...

'
System Identification: Part I

Experiment Design
$ '
33 System Identification: Part I

Model validation
$
34

• Choice of the type of excitation • Comparing the actual output of the system with the output
predicted by the model
• sum of sinusoids (multisine)
• realization of (filtered) white noise or alike • Determining the uncertainty of the system e.g. in the
frequency domain
• Which frequency content?
2
10

amplitude
0
10

• Which duration? 10

10
-2

-4

& % & %
-2 -1 0
10 10 10

0
phase
-200

-400

-600
-2 -1 0

Experiment design is very important since it has a direct


10 10 10
frequency

influence on the quality of the model


• ....
System Identification: Part I 35 System Identification: Part I 36
' System identification for (robust) control
$ ' $
amplitude
10
2 History
0
10

disturbance
10
-2

-4
• Basic principle (LS) from Gauss (1809)
Data → Model
10

input output 10
-2
10
-1
10
0

process
process
phase
0
• Development based on theories of
-200

-400
- stochastic processes
Identification
Feedback control system -600
10
-2
10
-1

frequency
10
0
- statistics
• Strong growth in sixties and seventies
Feedback
Feedbackcontrol
controlsystem
system Åström en Bohlin (1965), Åström en Eykhoff (1971)

disturbance
• Brought to technological tools in nineties
(Matlab Toolboxes for either time-domain of frequency domain),

& % & %
reference
Model → Controller input + output
as well as to professional industrial control packages
controller
controller process
process
- (Aspen, SMOC-PRO, IPCOS, Tai-Ji Control, AdaptX, ...).

'
System Identification: Part I

$ '
37 System Identification: Part I

$
38

Notions from estimation theory


Estimator θ̂N of θ0 based on N data points.
a. Unbiased (zuiver): E θ̂N = θ0
b. Consistent. θ̂N is consistent if:
• P r[limN →∞ θ̂N = θ0 ] = 1
Bull’s eye represents θ0 ;
• θ̂N → θ0 with probability 1 voor N → ∞.
left: unbiased estimate with small variance

& % & %
c. Variance: cov(θ̂N ) = E(θ̂N − E θ̂N )(θ̂N − E θ̂N )T . middle: biased estimate with small variance
right: unbiased estimate with large variance

System Identification: Part I 39 System Identification: Part I 40


' $ ' 1. Introduction
$
Why are discrete-time systems and signals important in system
identification?

In system identification, we deal with measured signals =⇒


Part II: RECAP on discrete-time systems and signals discrete-time signals

=⇒

& % & the models/systems can be represented by discrete-time transfer

%
' $ ' $
functions

System Identification: part II 1 System Identification: part II 2

2. Discrete-time systems

Continuous-time vs. Discrete-time systems u x[n]


ZOH ucont Continuous ycont Sampling y
Ts system Ts

u x[n]
ZOH ucont Continuous ycont Sampling y
Ts system Ts The system is excited via the discrete sequence u(t)
t = 0, 1, 2, ... generated by a PC
This discrete signal is made continuous by the Zero Order Hold
The continuous output ycont (tc ) (tc ∈ R) of the system is (ZOH):

& % & %
sampled with sampled time Ts
ucont (tc ) = u(t) for tTs ≤ tc < (t + 1)Ts
This sampling delivers the discrete measurements y(t) where
t = 0, 1, 2, ... i.e. y(t) = ycont (tTs )

System Identification: part II 3 System Identification: part II 4


' Illustration:
$ ' Upper plot: the discrete sequence u(t)
$
Bottom plot: the continuous signal ucont made by the ZOH
10 (red) compared with the discrete sequence u(t) (blue)
Continuous system: G0 (s) = s+10
1

0.8
Sampling time: Ts = 0.04 s. 0.6

0.4

0.2

The sequence u(t) is made up of 41 samples i.e. t = 0...40 0


0 5 10 15 20 25 30 35 40
samples

1
 0.8

 0 f or 0 ≤ t ≤ 2

& % & %

 0.6

u(t) = 0.8 f or 3 ≤ t ≤ 17 0.4


 0.2

0.5 f or 18 ≤ t ≤ 40

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
time (s)

'
System Identification: part II

The continuous signal ucont is then filtered by G0 (s) delivering


the continuous signal ycont (upper plot, red). This continuous
$ '
5 System Identification: part II

$
6

signal is then sampled with a sample period Ts = 0.04s. (upper Discrete-time transfer function
plot, blue circle). This delivers the discrete sequence y(t) of 41
samples (t = 0...40) (bottom plot)
Does it exist a transfer function relation between y(t) and u(t)?
1

0.8

0.6

0.4

0.2
u ZOH ucont Continuous ycont Sampling y
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Ts system Ts
time (s)

& % & %
0.8

0.6
G0(z)
0.4

0.2

0
0 5 10 15 20 25 30 35 40
samples

System Identification: part II 7 System Identification: part II 8


' $ ' Example:
$
u ZOH ucont Continuous ycont Sampling y
Ts system Ts

a
When G0 (s) = s+a and Ts = 0.04s., the discrete-time transfer
function between y(t) and u(t) is
G0(z)

(1 − b)z −1
+∞ G0 (z) = with b = e−aTs
X
−t 1 − bz −1
y(t)z
Y (z)
∆ t=−∞
G0 (z) = = Thus:
U (z) +∞

& % & %
X
u(t)z −t 10 0.33z −1
t=−∞ G0 (s) = ←→ G0 (z) =
s + 10 1 − 0.67z −1
G0 (z) can be computed from G0 (s) with the function c2d.m of

' $ ' $
Matlab (ZOH methodology)

System Identification: part II 9 System Identification: part II 10

Proof:
Properties of discrete-time transfer function
Suppose u(t) is a discrete step, then ucont (tc ) is a continuous step.
The step response of G0 (s) is, for tc > 0,

ycont (tc ) = 1 − e−at c u y


G0(z)
The sampled signal y(t) is given by ycont (tc ) at samples tc = tTs i.e.,
for t > 0,


y(t) = ycont (tTs ) = 1 − e−atT s
With some abuse, we will write
= 1 − bt

& % & %
1 1
Y (z) 1−z −1
− 1−bz −1 (1 − b)z −1
G0 (z) = = 1 = y(t) = G0 (z)u(t)
U (z) 1−z −1
1 − bz −1

System Identification: part II 11 System Identification: part II 12


' y(t) = G0 (z)u(t)
$ ' Remark:
$
pure delays can be easily represented within G0 (z)

can be seen as a difference equation since:


For continuous transfer function, a pure delay of α = βTs
seconds (β integer) is a non-rational part:


z −1 u(t) = u(t − 1)
10
e−αs
s + 10
Example:

& % & %
bz −1
y(t) = u(t) ⇐⇒ y(t) − ay(t − 1) = bu(t − 1) The corresponding rational discrete transfer function is:
1 − az −1

this allows to compute the sequence y(t) as a function of the 0.33z −1

' $ ' $
sequence u(t) z −β
1 − 0.67z −1

System Identification: part II 13 System Identification: part II 14

Impulse response of G0 (z)

Assume G0 (z) is causal Indeed:

The impulse response g0 (t) t = 0.. + ∞ is the response ∞


X
y(t) = G0 (z)u(t) when u(t) is a discrete pulse δ(t) i.e. y(t) = G0 (z)δ(t) = g0 (k)δ(t − k) = g0 (t)
k=0
u(t) = 1 when t = 0 and u(t) = 0 elsewhere

This response allows to rewrite G0 (z) as follows:


The impulse sequence g0 (t) can be deduced
• by solving the difference equation for u(t) = δ(t)

& % & %

• by dividing the numerator of G0 (z) by its denominator
X
G0 (z) = g0 (k)z −k
k=0

System Identification: part II 15 System Identification: part II 16


' $ ' $
Stability of G0 (z) Frequency response of G0 (z)

a transfer function is stable ⇐⇒ the poles of G0 (z) are all the frequency response of G0 (z) is given by the transfer
located within the unit circle function evaluated at z = ejω i.e. on the unit-circle:

G0 (z = ejω )
Example:
Only the frequency response between [0 π] is relevant.
bz −1
stable ⇐⇒ |a| < 1

& % & %
1 − az −1 Discrete frequency ω ∈ [0 π] =⇒ actual frequency
ωactual = Tωs (ωactual lies within the interval between 0 and the
Indeed, z = a is the unique pole of 1 − az −1 Nyquist pulsation)

'
System Identification: part II

General interpretation:
$ '
17 System Identification: part II

$
18

Frequency response representation: bode plot


Y (ω) = G0 (ejω )U (ω)
Bode Diagram
40

30

with Y (ω), U (ω) the Fourier transform of y(t), 20

10

u(t) (t = −∞... + ∞)

Magnitude (dB)
0

−10

−20

−30

−40

One particular consequence: −50

−60
0

−180

Phase (deg)
u(t) = sin(ω0 t) =⇒ −360

& % & %
−540

−720
−2 −1 0 1
10 10 10 10
Frequency (rad/sec)

y(t) = G0 (z)u(t) = |G0 (ejω0 )| sin ω0 t + ∠G0 (ejω0 )




System Identification: part II 19 System Identification: part II 20


' $ ' $
Remarks 2. Non-linearities

1. Choice of Ts We adopt a linear framework to define the relation between u


and y
The sampling period Ts is an important variable
We thus analyze the behaviour around one particular set-point
It should be chosen so that [0 Tπs ] covers the band of
significance of the continuous-time system If the system is used at multiple set-points, a model must be

& % & %
identified for each of them (and coupled with a scheduling
See end of the course for methodologies to choose Ts function)

System Identification: part II 21 System Identification: part II 22


' 3 Discrete-time signal analysis
$ ' $
Disturbance v(t):
Signals encountered in system identification 2.5

stochastic
1.5

0.5

signal
-0.5

-1

Input u(t):
-1.5

-2

-2.5
0 10 20 30 40 50 60 70 80 90 100

1.5

Output y(t):
0.5

multisine 0

-0.5

-1

-1.5

-2
0 10 20 30 40 50 60 70 80 90 100

y(t) = G0 (z)u(t) + v(t)

& % & %
2.5

1.5

(filtered) 1

0.5

-0.5

white noise -1

-1.5

-2

-2.5
0 10 20 30 40 50 60 70 80 90 100

'
System Identification: part II

Observations
$ '
23 System Identification: part II

Recap: Stochastic vs. Deterministic signals


$
24

the values taken by a stationary stochastic signal at different t


finite-power signals =⇒ analysis via their power spectrum Φ(ω) are different at each experiment/realization
(i.e. distribution of power content over the frequency ω)

BUT, each realization has the same power content over ω (i.e.
signal y(t) can be made up of a combination of stochastic and the same Φ(ω))
deterministic signal (e.g. when u(t) is a multisine)

=⇒ make it complicate to define Φ(ω)

& % & %
Ensemble

three real-
A new theory is necessary to deal with such signals called izations
quasi-stationary signals (see later)
t Time
0

System Identification: part II 25 System Identification: part II 26


' $ ' $
Analysis of quasi-stationary signals

Stationarity also implies that the mean of the signal and the
auto-correlation function is time-invariant A quasi-stationary signal is a finite-power signal which can be

• a stochastic signal (stationary)


the values taken by a deterministic signal at different t and thus • a deterministic signal
Φ(ω) are the same for all experiments/realizations
• the summation of a stochastic and a deterministic signal

& % & %
In identification, the deterministic signals are the multisines
Analysis very close to the one of stationary stochastic signals
(see WB2310 S&R3)

'
System Identification: part II

Mean Ēu(t) of a quasi-stationary signal u(t)


$ '
27 System Identification: part II

$
28

N Power spectrum Φu (ω) of a quasi-stationary signal


1 X
Mean of a deterministic signal u(t): lim u(t)
N →∞ N t=1
The power spectrum of u(t) is defined as the Fourier Transform
Mean of a stochastic signal u(t): Eu(t)
of the auto-correlation function of u(t):

=⇒ New operator Ē ∆
+∞
X
Φu (ω) = Ru (τ ) e−jωτ
τ =−∞

N
∆ 1 X with
Ēu(t) = lim Eu(t)

& % & %
N →∞ N
t=1

Ru (τ ) = Ē (u(t) u(t − τ ))

for purely stochastic or deterministic signal, the new operator is


equivalent to the classical mean operator

System Identification: part II 29 System Identification: part II 30


' ∆
Total power Pu = Ēu2 (t) of u(t):
$ ' Φu (ω) =
+∞
X
2 −jω0
Ru (τ )e−jωτ = σu e 2
= σu ∀ω
$
π
1
Z
τ =−∞
Pu = Ru (0) = Φu (ω)dω
2π −π

Φ u (ω)
Example 1: Φu (ω) and Pu when u(t) is a white noise of
2
variance σu ? σ 2
u

1 X
N ω
Ru (τ ) = lim E (u(t)u(t − τ )) −π 0 π
N →∞ N t=1

& % & %
= E (u(t)u(t − τ )) by stationarity

 σ 2 when τ = 0 2
and Pu = Ru (0) = σu
∆ u
=
 0 when τ 6= 0

'
System Identification: part II

$ '
31 System Identification: part II

=⇒ Ru (τ ) =
$
32

N  2
1 X A A20

lim cos(ω0 τ ) − cos(2ω0 t − ω0 τ + 2φ)
N →∞ N 2 2
Example 2: Φu (ω) and Pu when u(t) = Asin(ω0 t + φ) t=1

since Es(t) = s(t) for a deterministic signal.

Ru (τ ) = Ē (u(t)u(t − τ )) A2
2
 =⇒ Ru (τ ) = cos(ω0 τ )
= Ē A sin(ω0 t + φ) sin(ω0 t − ω0 τ + φ) 2
 2
A A20

= Ē cos(ω0 τ ) − cos(2ω0 t − ω0 τ + 2φ) and thus, in the fundamental frequency range [−π π],
2 2

& % & %
A2 π
Φu (ω) = (δ(ω − ω0 ) + δ(ω + ω0 ))
2

A2
and Pu = Ru (0) = 2
.

System Identification: part II 33 System Identification: part II 34


' $ ' $
A2 π
Φu (ω) = (δ(ω − ω0 ) + δ(ω + ω0 ))
2

The power spectrum of the sinus is independent of its phase Properties of the power spectrum
shift φ and is = to 0 except in ±ω0 where it is infinite.
y(t) = G(z)u(t) =⇒ Φy (ω) = |G(ejω )|2 Φu (ω)
Φ u(ω)
y(t) = s1 (t) + s2 (t) with s1 (t) independent of s2 (t)

=⇒ Φy (ω) = Φs1 (ω) + Φs2 (ω)


ω

& % & %
−π −ω 0 0 ω0 π

'
System Identification: part II

Cross- and auto-correlation function


$ '
35 System Identification: part II

Approximations of Ru (τ ) and Φu (ω) using finite data


$
36

The cross-correlation Ryu (τ ) between y and u is a function


which allows to verify whether two q-s signals y(t) and u(t) are To exactly compute Ru (τ ) and Φu (ω), we need both an infinite
correlated with each other number of measurements of u(t) and an infinite number of
realizations of u(t)

Ryu (τ ) = Ē (y(t)u(t − τ )) In practice, we have generally N < ∞ measurements of u(t):
{u(t) | t = 0...N − 1}
Properties:
A. Approximation of Ru (τ ) and properties of this approximation
• the value of y(t) at time t is not (cor)related in any way to

the value of u(t − τ ) =⇒ Ryu (τ ) = 0 N −1
 1
 X
u(t)u(t − τ ) f or |τ | < N − 1

& % & %

N
• the signals y(t) and u(t) are independent =⇒ R̂u (τ ) = N t=0

Ryu (τ ) = 0 ∀τ  0

f or |τ | > N − 1

NB. Ru (τ ) = Ruu (τ )

System Identification: part II 37 System Identification: part II 38


' $ ' B. Approximation of Φu (ω) (Periodogram) and properties of
this approximation
$
Φu (ω) can be approximated in two equivalent ways:
This approximation is a consistent estimate of Ru (τ ) i.e.
+∞
X
Φ̂N
u (ω) = N
R̂u (τ ) e−jω τ
N
lim R̂u (τ ) = Ru (τ ) τ =−∞
N →∞

= UN (ω) UN (ω)

with UN (ω) the (scaled) Fourier Transform of {u(t) | t = 0...N − 1}


For fixed N , though, the accuracy of N
(τ )
decreases for
R̂u i.e.
N
increasing values of τ since R̂u (τ ) is computed with lesser and

& % & %
N −1
1 X
lesser products u(t)u(t − τ ) UN (ω) = √ u(t) e−jω t
N t=0

Note: the approximation via UN (ω) is the most logical for

' $ ' $
deterministic signals

System Identification: part II 39 System Identification: part II 40

Example 1: we have collected N = 1000 time-samples of a


2
white noise of variance σu = 100
When u(t) is deterministic, Φ̂N
u (ω) is a consistent estimate of
Φu (ω) 40

30

lim Φ̂N (ω) = Φu (ω) 20


N →∞ u
10

For all other cases, we have only that Φ̂N


u (ω)
is an

u(t)
0

asymptotically unbiased estimate of Φu (ω) (variance is nonzero) −10

−20

& % & %
lim E Φ̂N
u (ω) = Φu (ω)
N →∞ −30

−40
0 100 200 300 400 500 600 700 800 900 1000
time

System Identification: part II 41 System Identification: part II 42


' Obtained Periodogram Φ̂N
u (ω) (blue) w.r.t. Φu (ω) (red)
$ ' As expected, it does not change when N is increased to
$
3
N = 10000:
2.5
3

2
2.5

1.5
Periodogram (logarithmic scale)

1
1.5

Periodogram (logarithmic scale)


0.5
1

0
0.5

−0.5
0

−1
−0.5

−1.5

& % & %
−1

−2
0 0.5 1 1.5 2 2.5 3 −1.5
ω

−2
0 0.5 1 1.5 2 2.5 3
ω

Φ̂N
u (ω) is an erratic function fluctuating around Φu (ω)

'
System Identification: part II

Example 2: we have collected N = 100 time-samples of


$ '
43 System Identification: part II

Obtained Periodogram Φ̂N


u (ω) for ω ∈ [0 π]
$
44

u(t) = sin(0.63t) + 12 sin(1.26t) + 34 sin(1.89t) (fundamental 30

period = 10 time-samples): 25

2
20

Periodogram
1.5

15
1

0.5 10
u(t)

0
5

−0.5

& % & %
0 0.5 1 1.5 2 2.5 3
ω
−1

−1.5

−2
0 10 20 30 40 50 60 70 80 90 100
It can be proven that the value at ω1 = 0.63, ω2 = 1.26 and
time N A2
ω3 = 1.89 are given by 4 i where Ai (i = 1, 2, 3) is the
amplitude of the sinusoid of frequency ωi .

System Identification: part II 45 System Identification: part II 46


' For N → ∞, Φ̂N u (ω) tends thus to Φu (ω). Here is the
$
periodogram for the same signal when N = 1000

300

250

200
Periodogram

150

100

& %
50

0
0 0.5 1 1.5 2 2.5 3
ω

System Identification: part II 47


' $ ' 1. Introduction about Prediction Error Identification
$
1.1. Assumptions on the True System: S = { G0 H0 }

v(t)
z }| {
y(t) = G0 (z)u(t) + H0 (z)e(t)

Part III: Prediction Error Identification e(t)

H0(z)

u(t) + y(t)
G0(z)

& % & %
' $ ' $
true system S

System Identification 0 System Identification 1

G0 (z) and H0 (z) are two unknown linear transfer functions in


3z −1
the Z-transform ( e.g. G0 (z) = 1+0.5z −1
and the disturbance v(t) represents the measurement noise; the
1
H0 (z) = 1+0.5z −1
) effects of stochastic disturbance, the effects of non-measurable
input signals; · · ·
the input signal u(t) is chosen by the operator and applied to S the disturbance v(t) is modeled by H0 (z)e(t):
and the output signal y(t) is measured
• H0 (z) is stable, inversely stable and
y(t) is assumed to be made up of two distinct contributions:
P∞
monic (i.e. H0 (z) = 1 + k=1 h0 (k)z −k )
• G0 u(t): dependent of the choice of u(t) • e(t) is a white noise signal i.e. a sequence of independent,
identically distributed random variables (no assumption is

& % & %
made on the probability density function)
• the disturbance v(t) = H0 (z)e(t) : independent of the
input signal u(t)

System Identification 2 System Identification 3


' Properties of e(t) and v(t) as a consequence of the assumptions
$ ' 1.2. Objective of PE Identification
General Objective
$
Since {e(t)} is a white noise, Find the best parametric models G(z, θ) and H(z, θ) for the
unknown transfer functions G0 and H0 using a set of measured
data u(t) and y(t) generated by the true system S.
Ee(t) = 0

Re (τ ) = Ee(t)e(t − τ ) = σe2 · δ(τ ) Example of parametric models:

{v(t)} is therefore the realization of a stochastic process with θ1 z −1 1


G(z, θ) = H(z, θ) =
properties: 1 + θ2 z −1 1 + θ2 z −1

& % & %
 
Ev(t) = 0 θ1
θ=  M = { G(z, θ), H(z, θ) ∀θ ∈ R2 }
Φv (ω) = |H0 (eiω )|2 · σe2 θ2

' $ ' $
Note: H (z, θ) is always chosen as a monic transfer function (like H0 )

System Identification 4 System Identification 5

In the beginning, we will make the following assumption:


Summary: the full-order identification problem

∃θ0 such that G(z, θ0 ) = G0 (z) and H(z, θ0 ) = H0 (z) Consider the following true system:
v(t)
i.e. S ∈ M z }| {
y(t) = G0 (z)u(t) + H0 (z)e(t) = G(z, θ0 )u(t) + H(z, θ0 )e(t)
The objective can therefore be restated as follows:
from which N input and output data have been measured:
Find (an estimate of) the unknown parameter vector θ0 using a
set of N input and output data: Z N = { u(t), y(t) | t = 1...N }.

& % & %
Z N = { u(t), y(t) | t = 1...N } Given the parametrization G(z, θ) and H(z, θ), find (an
estimate of) the unknown parameter θ0 .
generated by the true system i.e. y(t) = G0 u(t) + H0 e(t)

System Identification 6 System Identification 7


' $ ' $
Simple idea to reach this objective :
Problem: y(t, θ) can not be computed since the white noise
sequence e(t) is unknown
Let us simulate the parametric models with the input u(t) in
ZN : Consequences:
y(t, θ) = G(z, θ)u(t) + H(z, θ)e(t) • we need to find an accurate way to predict y(t, θ)
and let us find the vector θ for which: • the predictor ŷ(t, θ) should be chosen in such a way that θ0
y(t) − y(t, θ) = 0 ∀t = 1...N can still be deduced e.g. by minimizing the power of
y(t) − ŷ(t, θ)

& % & %
In other words, θ = θ0 minimizes the power of y(t) − y(t, θ)

'
System Identification

$ '
8 System Identification

ǫ(t, θ) compares the output of the true system and the


predicted output of a candidate model. e(t)
$9

2. Predictor ŷ(t, θ) in identification and prediction error ǫ(t, θ)


H0 (z)
N
Given Z and a model G(z, θ), H(z, θ) in M, we define the
predictor ŷ(t, θ) of the output of this model as follows: u(t) y(t)
+
G0 (z)

ŷ(t, θ) = H(z, θ)−1 G(z, θ)u(t)+(1−H(z, θ)−1 )y(t) ∀t = 1...N

and we define the prediction error ǫ(t, θ) as follows: – +


G(z, θ)

ǫ(t, θ) = y(t) − ŷ(t, θ) ∀t = 1...N

& % & %
= H(z, θ) −1
(y(t) − G(z, θ)u(t)) ∀t = 1...N 1
H(z, θ)

ǫ(t, θ)
System Identification 10 System Identification 11
' Properties of the prediction error ǫ(t, θ)
Property 1. Given θ and Z N , ǫ(t, θ) computable ∀t = 1...N
$ ' Property 2. ǫ(t, θ0 ) = e(t) (smth really unpredictable at time t)
$
Example:
 
  y(t)
θ1
z }| {
θ1 z −1 1 ǫ(t, θ) = H(z, θ)−1 G0 (z)u(t) + H0 (z)e(t) −G(z, θ)u(t)
 
G(z, θ) = 1+θ2 z −1
H(z, θ) = 1+θ2 z −1
θ= 
θ2
G0 (z) − G(z, θ) H0
! = u(t) + e(t)
 θ1 z −1 H(z, θ) H(z, θ)
ǫ(t, θ) = 1 + θ2 z −1 y(t) − u(t)
1 + θ2 z −1
= y(t) + θ2 y(t − 1) + θ1 u(t − 1) =⇒ ǫ(t, θ0 ) = e(t)

& % & %
Notes: Property 3. ǫ(t, θ) 6= white noise for all θ 6= θ0 (provided an
it is typically assumed that u(t < 0) = y(t < 0) = 0 appropriate signal u(t))

' $ ' $
H −1 (z, θ) is always causal since H (z, θ) is monic !

System Identification 12 System Identification 13

Sketch of the proof of Property 4:


2
Property 4. θ0 minimizes the power Ēǫ (t, θ) of ǫ(θ) i.e.
s1 (t,θ) s2 (t,θ)
z }| { z }| {
2
θ0 = arg minθ Ēǫ (t, θ) G0 (z) − G(z, θ) H0 (z) − H(z, θ)
ǫ(t, θ) = e(t) + u(t) + e(t)
1 X
N H(z, θ) H(z, θ)

with Ēǫ2 (t, θ) = lim Eǫ2 (t, θ)
N →∞ N
t=1 with s2 (t, θ) function of e(t − 1), e(t − 2), ... (not of e(t)).

Since ǫ(t, θ0 ) = e(t), we have thus: u(t) and e(t) uncorrelated and e(t) white noise =⇒

Ēǫ2 (t, θ0 ) = σe2 Ēǫ2 (t, θ) = σe2 + Ēs21 (t, θ) + Ēs22 (t, θ)

& % & %
Ēǫ2 (t, θ) > σe2 ∀θ 6= θ0
θ = θ0 minimizes both Ēs21 (t, θ) and Ēs22 (t, θ) by making them
equal to 0.
(the latter provided u(t) has been chosen appropriately)
=⇒ θ = θ0 minimizes Ēǫ2 (t, θ)

System Identification 14 System Identification 15


' $ ' Example
$
We have collected N = 2000 data u(t) and y(t) on the
following true system

z −3 0.103 + 0.181z −1
Important remark. The two following statements are equivalent: y(t) = u(t)+e(t)
1 − 1.991z −1 + 2.203z −2 − 1.841z −3 + 0.894z −4
• The true parameter vector θ0 reduces the prediction error
ǫ(t, θ) to the realization of the noise e(t). and we have chosen the following model structure M
• The true parameter vector θ0 minimizes the power of the 
 z −3
b0 + b1 z −1
  

M = G(z, θ) = ; H(z, θ) = 1
prediction error ǫ(t, θ).  1 + f1 z −1 + f2 z −2 + f3 z −3 + f4 z −4 

& % & %
 T
θ= b0 , b1 , f 1 , f 2 , f 3 , f 4

 T
=⇒ θ0 = 0.103 , 0.181 , −1.991 , 2.203 , −1.841 , 0.894

'
System Identification

We have computed ǫ(t, θ) (t = 1...N ) for θ = θ0 and for


another θ i.e. θ1 6= θ0 :
$ '
16 System Identification

As can be seen with R̂ǫN (τ ), ǫ(t, θ0 ) has well the properties of a


$
17

4
epsilon(t,theta0) white noise as opposed to ǫ(t, θ1 )
2
auto−correlations of epsilon(t,theta0) (red) and epsilon(t,theta1) (blue)
1.5
0

−2 1

−4
0 200 400 600 800 1000 1200 1400 1600 1800 2000
t 0.5
epsilon(t,theta1)
4

0
2

0
−0.5

−2

& % & %
−4 −1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
t

−1.5
0 5 10 15 20 25 30 35 40
 T tau

θ1 = 0.12 , 0.25 , −2 , 2.3 , −1.9 , 0.8

System Identification 18 System Identification 19


' $ ' $
Estimated power of ǫ(t, θ0 ) : 0.1015 (σe2 = 0.1) Summary:

Estimated power of ǫ(t, θ1 ) : 1.4678 • ǫ(t, θ) is a computable quantity comparing the output y(t)
of the true system and the predicted output of a model

Note: the estimated power is R̂ǫN (0) • θ = θ0 minimizes the power of ǫ(t, θ)

& % & %
'
System Identification

3. Mathematical criterion for prediction error identification


$ '
20 System Identification

$
21

Remark:
3.1. An ideal criterion
There is no difference between θ ∗ and θ0 at this stage of the
Denote by θ ∗ , the solution of the minimization of the power of
course since we suppose S ∈ M and u(t) appropriate.
the prediction error:

θ ∗ = arg minθ V̄ (θ) We nevertheless introduce the new notation θ ∗ since


N
1 X
• when S 6∈ M, the notion of true parameter vector θ0 does
with V̄ (θ) = Ēǫ2 (t, θ) = lim Eǫ2 (t, θ)
N →∞ N t=1 not exist, while the minimum θ ∗ of the cost function V̄ (θ)
exists
Properties of V̄ (θ) and θ ∗ (when S ∈ M and u(t) appropriate)
• if u(t) is not chosen appropriately, then V̄ (θ) has several

& % & %
V̄ (θ) has an unique minimum θ ∗ minima and θ ∗ represents the set of these minima, while θ0
is one single parameter vector

θ ∗ = θ0

System Identification 22 System Identification 23


' $ ' 3.2. Tractable identification criterion
$
The true parameter vector θ0 is thus the solution of: Power of prediction error is estimated using the N available
data Z N :
arg minθ V̄ (θ)
N
N 1 X 2
1 X VN (θ, Z N ) = ǫ (t, θ)
with V̄ (θ) = Ēǫ2 (t, θ) = lim Eǫ2 (t, θ) N t=1
N →∞ N t=1
N
1 X 2
Question ? Is it possible to consider this criterion ? NO !!! = (H(θ)−1 (y(t) − G(θ)u(t))
N t=1

& % & %
Indeed, the power of the prediction error can not be exactly Parameter estimation through minimizing VN :
computed with only one experiment and only N measured data.
θ̂N = arg min VN (θ, Z N )
θ

'
System Identification

$ '
24 System Identification

Example:
$
25

Consequences and properties of the identified parameter


vector θ̂N : 0.7z −1 1
S: y(t) = u(t) + e(t)
1 + 0.3z −1 1 + 0.3z −1
• different experiments and data =⇒ different θ̂N .
• θ̂N is only an estimate of θ ∗ (= θ0 ).  
bz −1 1 a
• θ̂N is a random variable which is asymptotically (N → ∞) M : G(z, θ) = 1+az −1
H(z, θ) = 1+az −1
θ= 
b
Gaussian with mean θ ∗ :

θ̂N ∼ AsN (θ ∗ , Pθ ) we have applied 20 times the same sequence u(t) of length
N = 200 and we have measured the corresponding y(t).

& % & %
• θ̂N → θ ∗ with probability 1 when N → ∞ (i.e. Pθ → 0
when N → ∞ ) For these 20 experiments, we have computed the estimate θ̂N
 T
of θ0 = 0.3 , 0.7

System Identification 26 System Identification 27


' The twenty estimates θ̂N are represented with a blue cross and
$ ' $
θ0 by a red circle.
How can we solve the optimization problem delivering θ̂N ?
1

0.9
N
0.8 1 X
θ̂N = arg min ǫ2 (t, θ)
0.7 θ N t=1
0.6
N
0.5 1 X 2
b

= arg min (H(θ)−1 (y(t) − G(θ)u(t))


0.4 θ N t=1
0.3

& % & %
0.2
In order to answer this question, the parametrization of G(z, θ)
0.1
and H(z, θ) must be defined more precisely.
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
a

'
System Identification

4 Black box model structures


$ '
28 System Identification

Model structures used in practice


$
29

Model structure: M = {(G(z, θ), H(z, θ)), θ ∈ Rnθ } Model structure G(z, θ) H(z, θ)
General parametrization used in the Matlab Toolbox: z −nk B(z, θ) 1
ARX
A(z, θ) A(z, θ)
z −nk B(z,θ) C(z,θ)
G(z, θ) = F (z,θ)A(z,θ)
H(z, θ) = D(z,θ)A(z,θ)
z −nk B(z, θ) C(z, θ)
ARMAX
  A(z, θ) A(z, θ)
T
θ = a1 .. ana b0 .. fnf z −nk B(z, θ)
OE - Output Error 1
B(z, θ) = b0 + b1 z −1 + · · · + bnb −1 z −nb +1 F (z, θ)
A(z, θ) = 1 + a1 z −1 + · · · + ana z −na

& % & %
FIR z −nk B(z, θ) 1
C(z, θ) = 1 + c1 z −1 + · · · + cnc z −nc
D(z, θ) = 1 + d1 z −1 + · · · + dnd z −nd z −nk B(z, θ) C(z, θ)
BJ - Box-Jenkins
F (z, θ) = 1 + f1 z −1
+ · · · + fnf z −nf
F (z, θ) D(z, θ)

System Identification 30 System Identification 31


' Example: ARX Model structure
$ ' Distinction between model structures
$
• ARX and FIR have a predictor linear in θ
z −nk B(z, θ) 1
G(z, θ) = ; H(z, θ) = ŷ(t, θ) = z −nk B(θ)u(t) + (1 − A(θ))y(t)
A(z, θ) A(z, θ)
= φT (t)θ
with
is a linear function in θ ⇒ Important computational
B(z, θ) = b0 + b1 z −1 + · · · + bnb −1 z −nb +1
advantages.
A(z, θ) = 1 + a1 z −1 + · · · + ana z −na
 T
θ = a1 a2 · · · ana b0 b1 · · · bnb −1 . • BJ, FIR and OE have an independent parametrization of
G(z, θ) en H(z, θ)

& % & %
na , nb are the number of parameters in the A and B
polynomial.
There are no common parameters in G and H.
nk number of time delays
⇒ Advantages for independent identification of G and H.

System Identification 32 System Identification 33


' $ ' $
with
5 Computation of the identified parameter vector θ̂N
φ(t) = (−y(t − 1), ..., −y(t − na ),
N N
1 X 1 X 2 u(t − nk ), ..., u(t − nk − nb + 1))T
θ̂N = arg min ǫ2 (t, θ) = arg min (y(t) − ŷ(t, θ))
θ N t=1
θ N t=1  T
θ= a1 a2 · · · ana b0 · · · bnb −1
5.1 Case of a predictor linear in θ (ARX and FIR)
1 PN 2
VN (θ, Z N ) = N t=1 y(t) − φT (t)θ is quadratic in θ.
z −nk B(θ) 1
G(θ) = ; H(θ) =
A(θ) A(θ)

Predictor ŷ(t, θ): VN (θ)

ŷ(t, θ) = H(θ)−1 G(θ)u(t) + [1 − H(θ)−1 ]y(t)


= z −nk B(θ)u(t) + [1 − A(θ)]y(t)
= φ(t)T θ LINEAR in θ !!!

& % & %
θ

System Identification 34 System Identification 35

' $ ' $
VN (θ,Z N )
z }| {
N
1 X 2
θ̂N = arg minθ y(t) − φ(t)T θ can be determined
N t=1 As a consequence:
analytically using:
 −1
∂VN (θ, Z N )
=0 1 X N N
1 X
 
∂θ T

= φ(t)φ (t)

θ=θ̂N θ̂N  · φ(t)y(t)
 N t=1 N

 t=1
Indeed: | {z } | {z }
R(N ) f (N )
N
∂VN (θ, Z N ) 1 X
= −2 [φ(t)y(t) − φ(t)φT (t)θ]
∂θ N t=1

Putting derivative to 0 in θ = θ̂N delivers: • Analytical solution through simple matrix operations.
" N
# N
1 X T 1 X
φ(t)φ (t) θ̂N = φ(t)y(t)
N t=1 N t=1

& % & %
System Identification 36 System Identification 37
' $ ' $
5.2 Case of a predictor nonlinear in θ (OE,BJ,ARMAX)  T
θ= b0 · · · bnb −1 f1 f2 · · · fnf
Example of the OE model structure:
VN (θ,Z N )
z −nk B(θ) z }| {
G(θ) = ; H(θ) = 1 N
F (θ) 1 X 2
θ̂N = arg minθ y(t) − φ(t, θ)T θ can not be
N t=1
Predictor ŷ(t, θ):
determined analytically using:
ŷ(t, θ) = H(θ)−1 G(θ)u(t) + [1 − H(θ)−1 ]y(t)
∂VN (θ, Z N )
B(θ) =0
= z −nk u(t) ∂θ
F (θ) θ=θ̂N

= φ(t, θ)T θ NONLINEAR in θ !!! since this derivative is a very complicate expression which is
nonlinear in θ and since this derivative is (generally) equal to 0
with for several θ (local minima).
φ(t, θ) = (u(t − nk ), ..., u(t − nk − nb + 1), The solution θ̂N will therefore be computed iteratively. Risk of
T
−ŷ(t − 1, θ), ..., −ŷ(t − nf , θ))
& % & %
finding a local minimum !

System Identification 38 System Identification 39

' $ ' $

Counterexample

6 Conditions on experimental data b0 z −1 1


S : y(t) = u(t) + e(t)
1 + f0 z −1 1 + d0 z −1

The ideal identification criterion Consider u(t) = 0 ∀t as input signal and a full-order model
structure M for S:
  

arg min Ēǫ2 (t, θ)


 b 
bz −1 1

 
 
M = G(z, θ) = ; H(z, θ) = θ =  d 
θ 

 1 + f z −1 1 + dz −1
 


f

Consequently:
has a unique solution θ ∗ (i.e. θ ∗ = θ0 when S ∈ M) if the =0
input signal u(t) that is chosen to generate the experimental z }| {
G0 (z) − G(z, θ) H0
data is sufficiently rich. ǫ(t, θ) = u(t) + e(t)
H(z, θ) H(z, θ)

& % & %
System Identification 40 System Identification 41
' $ ' $

1 + dz −1
=⇒ ǫ(t, θ) = e(t)
1 + d0 z −1
Notion of signal richness: persistently exciting input signals
We know that Ēǫ2 (t, θ) is minimum for θ making ǫ(t, θ) = e(t)
A quasi-stationary signal u is persistently exciting of order n if
=⇒ the (Toeplitz) matrix R̄n is non-singular
 
The power Ēǫ2 (t, θ) is minimized for each θ making Ru (0) Ru (1) ··· Ru (n − 1)
 
H(z, θ) = H0 i.e. 
 Ru (1) Ru (0) ··· Ru (n − 2)


R̄n :=  .. .. .. ..

. .
 
    . . 

 b 
  
Ru (n − 1) Ru (1) Ru (0)
 
∗ 

θ =  d0  ∀b ∈ R and ∀f ∈ R
 ···

 

f
 

Note: θ0 lies in the set of θ ∗ .


& % & %
System Identification 42 System Identification 43

' $ ' $
Examples:
2
• A white noise process (Ru (τ ) = σu δ(τ )) is persistently
2
exciting of infinite order. Indeed, R̄n = σu In .
• a block signal
2
 
1
1.5
1 3
− 13 −1
1
 
 1 1 1
− 13 
R̄4 =  3 3
0.5
 

 −1 1
1 1 
0
 3 3 3 
−0.5
−1 − 13 1
3
1
−1

−1.5
R̄3 is regular, R̄4 is singular. Consequently, u is p.e. of order 3
−2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1
Ru (0) = 1 Ru (1) = 3
Ru (2) = − 13 Ru (3) = −1
Ru (4) = − 13 Ru (5) = 1
3
Ru (6) = 1 etcetera

& % & %
System Identification 44 System Identification 45
' $ ' $

Another method to determine the order of u


If the spectrum Φu is unequal to 0 in n points in the interval Important result. Let us denote the number of parameters in
(−π, π], then u is persistently exciting of order n the function G(z, θ) by ng . The ideal identification criterion i.e.

Example θ ∗ = arg min V̄ (θ)


θ
The signal
u(t) = sin(ω0 t) has a unique solution (i.e. θ = θ0 ) if the signal u(t) generating

the data is sufficiently exciting of order ≥ ng .


is persistently exciting of order 2. (Φu has a contribution
in ±ω0 ).

& % & %
System Identification 46 System Identification 47

' $ ' $
Sketch of the proof (case of a FIR model structure):
nb
X
ǫ(t, θ) = y(t) − bk u(t − k) (nk = 1)
k=1
What can we say about the identification of θ̂N ?
θ̂N will be the (consistent) estimate of θ ∗ = θ0 (the unique
θ ∗ is characterized by:
solution of the ideal criterion) if the input signal is sufficiently
θ∗ exciting of order ≥ ng .
  z }| {  
Ru (0) ··· Ru (nb −1) b∗1 Ryu (1)
    
Ru (1) Ru (nb −2)   b2   Ryu (2)
  ∗   
 ··· 
.. .. ..   ..  =  ..
    
.
 
 . .  .   . 
     Remark. In the sequel, we will always assume that the signal
Ru (nb −1) · · · Ru (0) b∗nb Ryu (nb ) u(t) has been chosen such that it is persistently exciting of
sufficient order.
Consequence:
θ ∗ can uniquely be identified if and only if u is persistently

& % & %
exciting of order ≥ nb .

System Identification 48 System Identification 49


' $ ' $
First experiment on S
Example
we have applied u(t) = sin(0.1t) (u p.e. of order 2) to S and
Let us consider the following true system S: collected N = 2000 IO data:

z −3 0.103 + 0.181z −1 2
input signal u(t)

y(t) = u(t)+e(t)
1 − 1.991z −1 + 2.203z −2 − 1.841z −3 + 0.894z −4 1

we have chosen the full-order model structure M −1

   
−2
 z −3 b0 + b1 z −1  0 200 400 600 800 1000 1200 1400 1600 1800 2000
M = G(z, θ) = ; H(z, θ) = 1 t
 1 + f1 z −1 + f2 z −2 + f3 z −3 + f4 z −4 
output signal y(t)
2

 T 0

θ= b0 , b1 , f1 , f2 , f3 , f4 =⇒ nG = 6
−1

−2
0 200 400 600 800 1000 1200 1400 1600 1800 2000
t

we now perform two identification experiments on S

& % & %
Using the 2000 recorded data, we have identified θ̂N

System Identification 50 System Identification 51

' $ ' $
G(z, θ̂N ) (blue) is compared with G(z, θ0 ) (red): Second experiment on S

From u1 to y1
we have applied a white noise u(t) (u p.e. of order ∞) to S
2
10
and collected N = 2000 IO data:
0
10
Amplitude

input signal u(t)


2
−2
10
1

−4
10 0
−2 −1 0 1
10 10 10 10

−1
0

−2
−200 0 200 400 600 800 1000 1200 1400 1600 1800 2000
Phase (degrees)

−400 output signal y(t)


2

−600
1

−800
−2 −1 0 1
10 10 10 10 0
Frequency (rad/s)

−1

Due to the lack of excitation, there are multiple θ ∗ which −2


0 200 400 600 800 1000
t
1200 1400 1600 1800 2000

minimize Ēǫ2 (t, θ) and the identified θ̂N is a consistent


estimate of one of these θ ∗ (6= θ0 )
& % & %
Using the 2000 recorded data, we have identified θ̂N

System Identification 52 System Identification 53


' $

G(z, θ̂N ) (blue) is compared with G(z, θ0 ) (red):

From u1 to y1
2
10

0
10

Amplitude
−2
10

−4
10
−2 −1 0 1
10 10 10 10

−200
Phase (degrees)

−400

−600

−800
−2 −1 0 1
10 10 10 10
Frequency (rad/s)

Since the signal u is p.e. of order ≥ 6, θ ∗ is unique and the


identified θ̂N is a consistent estimate of this θ ∗ = θ0
& %
System Identification 54
 
When S ∈ M, the identified parameter vector θ̂N has the
following property:
7 Statistical properties of θ̂N when S ∈ M
• θ̂N ∼ N (θ0 , Pθ )

Due to the stochastic noise v(t) corrupting the data Z N , the


• θ̂N → θ0 with probability 1 when N → ∞ (i.e. Pθ → 0
identified parameter vector θ̂N is a random variable i.e.
when N → ∞ ).

the value of θ̂N is different at each experiment

   
Note: the first property is in fact θ̂N ∼ AsN (θ0 , Pθ )


System Identification

7.1 Normal distribution of the identified parameter vector θ̂N


Consider an identification experiment on S achieved using an
55

 System Identification 56

Interpretation of θ̂N ∼ N (θ0 , Pθ )


input signal u(t) and a number N of data
The parameter vector θ̂N identified in such an experiment is the Consider p different identification experiments on S which
realization of a normal distribution: (i)
deliver p different estimates θ̂N

θ̂N ∼ N (θ0 , Pθ )
E θ̂N = θ0 means that
p
1  (i)
lim θ̂N = θ0
p→∞ p
i=1
 
E (θ̂N − θ0 )(θ̂N − θ0 )T

Pθ =

   
σe2  −1
= Ēψ(t, θ0 )ψ T (t, θ0 )
N
θ̂N unbiased estimate of θ0
 
with ψ(t, θ0 ) = ∂ ŷ(t,θ) = − ∂ ε(t,θ)
 
∂θ  ∂θ 
θ=θ 0 θ=θ 0

System Identification 57 System Identification 58


  Estimates θ̂N distributed as θ̂N ∼ N (θ0 , Pθ ) with large Pθ

Interpretation of θ̂N ∼ N (θ0 , Pθ ) (con’t)

 
Pθ = E (θ̂N − θ0 )(θ̂N − θ0 )T

Estimates θ̂N distributed as θ̂N ∼ N (θ0 , Pθ ) with small Pθ


the covariance matrix Pθ gives an idea of the standard deviation
between θ̂N and θ0 (see next slide)

   

System Identification

Properties of the covariance matrix Pθ of θ̂N


59

 System Identification

ψ(t, θ0 ) =
ΛG (z, θ0 )
u(t) +
ΛH (z, θ0 )
e(t)
60

H(z, θ0 ) H(z, θ0 )
Property 1. Pθ is a function of the chosen input signal u(t) and
ΛG Λ∗ ΛH Λ∗
of the number N of data used for the identification. Now defining ΓG = HH ∗
G
and ΓH = HH ∗
H
and using Parseval
theorem
Proof: σe2  −1
Pθ = Ēψ(t, θ0 )ψ T (t, θ0 ) =⇒
G0 (z) − G(z, θ) H0 N
(t, θ) = u(t) + e(t) =⇒
H(z, θ) H(z, θ) −1
π
σ2
 
1 jω jω
Pθ = e ΓG (e , θ0 ) Φu (ω) + ΓH (e , θ0 ) σe2 dω
N 2π −π
−∂(t, θ)  ΛG (z, θ0 ) ΛH (z, θ0 )


   
ψ(t, θ0 ) = = u(t) + e(t)
∂θ 
θ=θ0 H(z, θ0 ) H(z, θ0 )
=⇒ Pθ function of u(t) and N
∂G(z,θ) ∂H(z,θ)
with ΛG (z, θ) = ∂θ
and ΛH (z, θ) = ∂θ We can therefore influence the value of Pθ by appropriately
choosing u(t) and N

System Identification 61 System Identification 62


  7.2 Consistency property of the PEI estimate θ̂N

Property 2. The covariance matrix Pθ is a function of the


unknown true system S via σe2 and θ0 . θ̂N → θ0 with probability 1 when N → ∞

Property 3. A reliable estimate P̂θ of Pθ can nevertheless be


deduced using the data and θ̂N ≡


N
−1 If we could collect N = ∞ data from S, then the identified
σ̂e2 1 
P̂θ = ψ(t, θ̂N )ψ T (t, θ̂N ) parameter vector θ̂N →∞ would have the following distribution:
N N t=1

   
N θ̂N →∞ ∼ N (θ0 , Pθ ) with Pθ = 0
with σ̂e2 = 1
N t=1 (t, θ̂N )2

In other words, θ̂N →∞ is a random variable whose realization is

 
always equal to θ0

System Identification 63 System Identification 64

7.3 Proof of the statistical properties of θ̂N when M is FIR

S: G0 (z) = a0 + b0 z −1 and H0 (z) = 1


Indeed:
N input-output data have been collected from S
σe2  T
−1
Pθ = Ēψ(t, θ0 )ψ (t, θ0 ) Full-order FIR model structure:
N
G(z, θ) = a + bz −1 H(z, θ) = 1
and N → ∞ =⇒
⎛ ⎞
a
θ=⎝ ⎠
b
Pθ → 0

   
Predictor:
 
ŷ(t, θ) = φ(t)T θ with φT (t) = u(t) u(t − 1)

System Identification 65 System Identification 66


  The estimate θ̂N is obtained as follows:
⎡ ⎤−1
N N
⎢1  1 
⎢ ⎥
T
=⎢ φ(t)φ (t)⎥

θ̂N ⎢ φ(t)y(t)
⎣ N t=1 N t=1

Note that the data y(t) and u(t) collected from S obey the

  
R
following relation:
θ0
⎛  ⎞
a0 What is the relation between θ̂N and θ0 ?
y(t) = φ(t)T ⎝ ⎠ +e(t) t = 1...N
b0
Replace y(t) by its expression:

   
=y(t)
⎛ ⎞
N
⎜1 
  
θ̂N = R−1 ⎝ φ(t) φ(t)T θ0 + e(t) ⎠
 ⎟
N t=1


System Identification


N
1 

67

 System Identification

What are the moments of this normal distribution ?


68

−1
θ̂N = θ0 + R φ(t)e(t)
N t=1
   Mean:
estimation error
N



1 
E θ̂N = θ0 + E R−1 φ(t)e(t)
N
=⇒ t=1

θ̂N is a random variable and is (asymptotically) normally Since φ(t) and R are deterministic (not stochastic):
distributed
⎛ =0 ⎞
N
1 

   
  
Indeed E θ̂N = θ0 + R−1 ⎝ φ(t) Ee(t)⎠
N t=1
• e(t) is a random process and
= θ0
• central limit theorem

System Identification 69 System Identification 70


 Covariance matrix:
 
 The FIR case is a very particular case: only the normal
Pθ = E (θ̂N − θ0 )(θ̂N − θ0 )T

distribution is asymptotic in N while E θ̂N = θ0 and the
covariance matrix are valid ∀N

N N
R−1 R−1




σ2
T Note that Pθ = Ne R−1 converges when N → ∞ to the
 
Pθ = E φ(t)e(t) e(s)φ (s)
N t=1 s=1
N asymptotic expression

N N
R−1 R−1

φ(t) E(e(t)e(s)) φT (s) σe2 

= Ēψ(t, θ0 )ψ T (t, θ0 )
−1
N t=1 s=1
N N
−1 N −1


R R

   
φ(t)φT (t) since

= σe2
N t=1
N
σe2 σe2 ŷ(t, θ) = φT (t)θ =⇒ ψ(t, θ) = φ(t) ∀θ
−1 −1 −1
= R RR = R
N N


System Identification 71

 System Identification

Parametric uncertainty region


72

θ̂N close to θ0 if Pθ “small”


What happens when N → ∞ ?

⎛ ⎛ ⎞⎞ To determine how close, we can build an uncertainty region in


N the parameter space:
1  u(t)e(t)
θ̂N →∞ = θ0 + lim ⎝R−1 ⎝ ⎠⎠
N →∞ N u(t − 1)e(t)
t=1
   θ̂N ∼ N (θ0 , Pθ ) ⇐⇒
random variable whose realisation is always 0

   
(θ0 − θ̂N )T Pθ−1 (θ0 − θ̂N ) ∼ χ2 (k)

with k the dimension of θ̂N

System Identification 73 System Identification 74


   
U = θ ∈ Rk | (θ − θ̂N )T Pθ−1 (θ − θ̂N ) ≤ α
(θ0 − θ̂N )T Pθ−1 (θ0 − θ̂N ) ∼ χ2 (k)

The uncertainty ellipsoid U is centered at the identified


the unknown true parameter vector θ0 lies therefore in the
parameter vector θ̂N and shaped by its covariance matrix Pθ
following ellipsoid U with probability, say, 95%

  The largest Pθ , the largest the ellipsoid and thus the largest the
U = θ ∈ Rk | (θ − θ̂N )T Pθ−1 (θ − θ̂N ) ≤ α uncertainty

   
Remark: G(z, θ0 ) lies with the same probability in

with α such that P r(χ2 (k) < α) = 0.95. D = {G(z, θ) | θ ∈ U }


System Identification 75

 System Identification 76

Example: Using these data, we have computed the estimate θ̂N of


 T
θ0 = 0.3 , 0.7 along with its (estimated) covariance
0.7z −1 1 matrix Pθ :
S: y(t) = u(t) + e(t)
1+ 0.3z −1 1 + 0.3z −1 ⎛ ⎞ ⎛ ⎞
0.301 0.4922 0.0017
θ̂N = ⎝ ⎠ Pθ = 10−3 ⎝ ⎠
⎛ ⎞ 0.733 0.0017 0.6264
bz −1 a
M : G(z, θ) = 1+az −1
H(z, θ) = 1
1+az −1
θ=⎝ ⎠
b The 95% uncertainty region U can then be constructed

   
we have applied a sequence u(t) of length N = 1000 to S and
 
U = θ ∈ Rk | (θ − θ̂N )T Pθ−1 (θ − θ̂N ) ≤ 5.99
we have measured the corresponding y(t).

System Identification 77 System Identification 78


 The estimate θ̂N ( blue cross ) along with its uncertainty
ellipsoid U in the parameter space
 8 Statistical distribution of the identified model when S ∈ M

1 the identified parameter vector θ̂N is a random variable


0.95
distributed as θ̂N ∼ AsN (θ0 , Pθ ) =⇒
0.9

0.85

0.8 the identified models G(z, θ̂N ) (and H(z, θ̂N )) are also random
0.75 variables:
b

0.7

0.65
• G(z, θ̂N ) is an (asymptotically) unbiased estimate of
0.6
G(z, θ0 )
0.55
• the variance of G(z, θ̂N ) is defined in the frequency domain

   
0.5
0 0.05 0.1 0.15 0.2 0.25
a
0.3 0.35 0.4 0.45 0.5
as:
 
cov(G(ejω , θ̂N )) = E |G(ejω , θ̂N ) − G(ejω , θ0 )|2

The in practice unknown θ0 is represented by the red circle and

 
lies in U as expected

System Identification 79 System Identification 80

cov(G(ejω , θ̂N )) can be expressed as a function of Pθ :


Properties of cov(G(ejω , θ̂N ))

cov(G(ejω , θ̂N )) = ΛG (ejω , θ0 ) Pθ Λ∗G (ejω , θ0 )


Property 1. cov(G(ejω , θ̂N )) is a function of the chosen u(t)
and of the number N of data used for the identification.

∂G(z,θ)
with ΛT
G (z, θ) = ∂θ direct consequence of the fact that Pθ is a function of these
quantities

   
(obtained using a first order approximation and the assumption that N
is large enough)

System Identification 81 System Identification 82


 
More speaking relation between the choice of u(t) and of N
and cov(G(ejω , θ̂N )) Property 2. cov(G(ejω , θ̂N )) is a function of the unknown S

Obtained by assuming that the MacMillan degree n of the Property 3. An estimate of cov(G(ejω , θ̂N )) can nevertheless
model G(z, θ) in M → ∞ be computed using the data and θ̂N

  n Φv (ω) cov(G(ejω , θ̂N )) ≈ Λ∗G (ejω , θ̂N ) P̂θ ΛG (ejω , θ̂N )


cov G(ejω , θ̂N ) ≈
N Φu (ω)

   

System Identification 83

 System Identification 84

Comparison with non-parametric identification:


9 Validation of the identified model when S ∈ M

• cov(G(ejω , θ̂N )) → 0 when N → ∞ (even for


non-periodic signal) We have identified a model G(z, θ̂N ) in M using Z N and we
have verified that S ∈ M (see later).
• the modeling error at ω1 is correlated to the error at ω2 due
to the parametrization Important question: Is G(z, θ̂N ) close to G(z, θ0 ) ?


System Identification
 
85 System Identification

86
 Validation using cov(G(ejω , θ̂N )

  More precizely, since G(z, θ̂N ) is normally distributed, we have
cov(G(ejω , θ̂N )) = E |G(ejω , θ̂N ) − G(ejω , θ0 )|2

at each frequency ω that

Consequently, at each frequency ω:




the modeling error |G(e , θ0 ) − G(e jω
, θ̂N )| is very likely to |G(ejω , θ0 )−G(ejω , θ̂N )| < 1.96 cov(G(ejω , θ̂N ) w.p. 95%
be small w.r.t. |G(ejω , θ̂N )|


if
cov(G(ejω , θ̂N ) is thus a measure of the modeling error and

   
 allows to deduce uncertainty bands around the frequency
the standard deviation cov(G(ejω , θ̂N ) of G(ejω , θ̂N ) is small response of the identified model G(z, θ̂N )
w.r.t. |G(ejω , θ̂N )|


System Identification


What is a small standard deviation cov(G(ejω , θ̂N ) (or a
87

 System Identification 88

small modeling error) w.r.t. |G(ejω , θ̂N )|? What to do if the variance appears too large ?

Highly dependent on the expected use for the model !! If the variance cov(G(ejω , θ̂N )) appears too large, then we can
not guarantee that G(z, θ̂N ) is a close estimate of G0 (z)
For example, if we want to use
the model for control, the
modeling error (measured by cov(G(ejω , θ̂N )) has to be A new identification experiment has then to be achieved in
much smaller around the cross-over frequency than at the other order to obtain a better model
frequencies

   
For this purpose, we have to take care that the variance in this
See the literature on “identification for robust control” to know new identification is smaller

how large cov(G(ejω , θ̂N ) may be

System Identification 89 System Identification 90


 How can we reduce the variance of the identified model in a
 Example

new identification ? Let us consider the same flexible transmission system S (in the
ARX form)

 n Φv (ω) Let us consider a full order model structure M for S



cov G(e , θ̂N ) ≈
N Φu (ω)
We want to use G(z, θ̂N ) for control


jω cov(G(ejω ,θ̂N )
Consequently, cov(G(e , θ̂N )) can be reduced by In this example, we need |G(ejω ,θ̂N )|
< 0.1 ∀ω ∈ [0 1]

   
• increasing the number of data N ; First identification experiment
• or increasing the power spectrum Φu (ω) of the input signal 2
We apply a white noise input signal u(t) of variance σu = 0.005
at the frequencies where cov(G(ejω , θ̂N )) was too large to S, collect N = 2000 IO data and identify a model G(z, θ̂N )

 
in M

System Identification 91 System Identification 92

Validation of the identified model G(z, θ̂N ):



we compare cov(G(ejω , θ̂N ) (blue) and |G(z, θ̂N )| (red):
Second identification experiment
identified G (red); standard deviation (blue)
2
10

1
10
We want to reduce the variance of the identified model

0
10

Let us for this purpose increase the power of u(t):


−1
10

−2
10
2
We apply a white noise input signal u(t) of variance σu = 1 to
S, collect N = 2000 IO data and identify a model G(z, θ̂N )

   
−3

in M
10
−2 −1 0
10 10 10
omega


cov(G(ejω , θ̂N ) is too large !!!

System Identification 93 System Identification 94


 Validation of the identified model G(z, θ̂N ):

 Third identification experiment
we compare cov(G(ejω , θ̂N ) (blue) and |G(z, θ̂N )| (red):

2
10
identified G (red); standard deviation (blue)
We want to reduce the variance of the identified model further
around the 1st peak
1
10

0
10
Let us for this purpose increase the power of u(t) around this
−1
10
first peak:

−2
10

u(t) = white noise of the 2nd experiment +sin(0.3t)+sin(0.4t)

   
−3
10
−2 −1 0
10 10 10
omega

 We apply this input signal u(t) to S, collect N = 2000 IO data


cov(G(ejω , θ̂N ) is better, but still too large at the 1st peak and identify a model G(z, θ̂N ) in M

 
for our control purpose!!!

System Identification 95 System Identification 96

Validation of the identified model G(z, θ̂N ):



we compare cov(G(ejω , θ̂N ) (blue) and |G(z, θ̂N )| (red):

2
10
identified G (red); standard deviation (blue)
Final note:
1
10

Similar analysis can be made for H(ejω , θ̂N ) using


cov(H(ejω , θ̂N ))
0
10

−1
10

−2
10
cov(H(ejω , θ̂N )) can be deduced using a similar reasoning as
for cov(G(ejω , θ̂N ))

   
−3
10
−2 −1 0
10 10 10
omega


cov(G(ejω , θ̂N ) is now OK for our control purpose!!!

System Identification 97 System Identification 98


' 10 A special case of undermodelling
$ ' Define, as before, the ideal identification criterion:
$
10.1 Identification in a model structure M which does not
contain S: S 6∈ M θ ∗ = arg min Ēǫ2 (t, θ)
θ

and the estimate θ̂N of θ ∗ :


S 6∈ M ⇐⇒ there does not exist a θ0 such that
N
1 X
θ̂N = arg min ǫ(t, θ)2
G(z, θ0 ) = G0 (z) and H(z, θ0 ) = H0 (z)
θ N t=1

Statistical properties of θ̂N w.r.t. θ ∗

& % & %
• θ̂N → θ ∗ w.p. 1 when N → ∞
Consider a model structure M = {G(z, θ) ; H(z, θ)} such
• θ̂N ∼ AsN (θ ∗ , Pθ ) (Pθ having a more complicate
that S 6∈ M and an input signal u(t) sufficiently exciting of
expression than when S ∈ M)
order ≥ ng

'
System Identification

$ '
99 System Identification

$
100

Since S 6∈ M, we have in general: 10.2 Special case of undermodelling: S 6∈ M with G0 ∈ G

G(z, θ ∗ ) 6= G0 (z) and H(z, θ ∗ ) 6= H0 (z) The model structure M used for identification purpose is
such that

One exception though: ∃θ0 such that G(z, θ0 ) = G0 (z) but H(z, θ0 ) 6= H0 (z)

& % & %
S 6∈ M with G0 ∈ G and M OE, BJ or FIR

System Identification 101 System Identification 102


' $ ' Result:
True system S: y = G0 u(t) + H0 e(t)
$
What can be said about θ ∗ in this special case ?

To answer this question, we distinguish two classes of model Chosen model structure M = { G(z, θ), H(z, θ) } such that
structures M: ∃θ0 with G(z, θ0 ) = G0 (z) but H(z, θ0 ) 6= H0 (z).

• M with no common parameters in G(θ) and H(θ) (i.e.


OE, BJ, FIR) • if M is OE, BJ or FIR, then
   
η η ∗

θ=  G(θ) = G(η) H(θ) = H(ζ) θ∗ =   G(z, η ∗ ) = G0 H(z, ζ ∗ ) 6= H0


ζ ζ ∗

& % & %
• M with common parameters in G(θ) and H(θ) (i.e. ARX, • if M is ARX or ARMAX, then
ARMAX) G0
z }| {
G(z, θ ) 6= G(z, θ0 ) H(z, θ ∗ ) 6= H0

'
System Identification

$ '
103 System Identification

$
104

Example
 Using these IO data, we have identified a model in two model
z −3 0.103 + 0.181z −1 structures such that S 6∈ M with G0 ∈ G:
y(t) = u(t)+v(t)
1 − 1.991z −1 + 2.203z −2 − 1.841z −3 + 0.894z −4

Marx = ARX(na = 4, nb = 2, nk = 3)
with v(t) = H0 (z)e(t); H0 (z) very complicate i.e. S is not
ARX, not OE !!!
Moe = OE(nb = 2, nf = 4, nk = 3)

2
We have applied a powerful white noise input signal (σu = 5) to arx
Let us denote G(z, θ̂N oe
) and G(z, θ̂N ), the models identified

& % & %
S and collected a large number of IO data (N = 5000) =⇒ in Marx and Moe , respectively.
small variance =⇒ θ̂N ≈ θ ∗

System Identification 105 System Identification 106


' arx
Bode plots of G(z, θ̂N
G0 (z) (red)
oe
) (blue) and G(z, θ̂N ) (black) and
$
From u1 to y1
2
10

0
10

Amplitude
−2
10

−4
10
−2 −1 0 1
10 10 10 10

−200
Phase (degrees)

−400

−600

& %
−800
−2 −1 0 1
10 10 10 10
Frequency (rad/s)

oe arx
As expected, we obtain G(z, θ̂N ) ≈ G0 (z) and G(z, θ̂N ) very
different from G0

System Identification 107


' $ ' 11.1 Model structure validation: an a-posteriori verification
$
11 Choice and validation of model order and structure

Assume that we have identified a parameter vector θ̂N in a


Until now, we have posed assumptions on the property of the model structure M = { G(θ), H(θ) } with N data Z N
model structure M w.r.t. S: collected on the true system S: y(t) = G0 u(t) + H0 e(t).

• S∈M
Model structure validation: based on θ̂N and Z N , determine if
• S 6∈ M with G0 ∈ G
the chosen model structure M is such that:
• S 6∈ M with G0 6∈ G
• S ∈ M or

& % & %
How can we verify these assumptions ? • S 6∈ M with G0 ∈ G
a solution: model structure validation • S 6∈ M with G0 6∈ G

'
System Identification

11.2 Model structure validation in the asymptotic


case (N → ∞)
$ '
108 System Identification

Situation A
We observe:
$
109


 σ2 for τ = 0
e
The identified parameter vector is then θ ∗ Rǫ (τ ) = σe2 δ(τ ) =
 0 elsewhere
Model structure validation is performed by considering Rǫ (τ )
Rǫu (τ ) = 0 ∀ τ
and Rǫu (τ ) of ǫ(t, θ ∗ ):
This situation occurs when
ǫ(t, θ ) = H(θ )
∗ ∗ −1
(y(t) − G(θ )u(t))

G0 − G(θ ∗ ) H0
Due to the fact that ǫ(t, θ ∗ ) = u(t) + e(t)
H(θ ∗ ) H(θ ∗ )

& % & %
G0 − G(θ ) ∗
H0
ǫ(t, θ ∗ ) = u(t) + e(t), = 0 × u(t) + e(t)
H(θ ∗ ) H(θ ∗ )
three situations can occur for these quantities Rǫ (τ ) and ⇐⇒ G(θ ∗ ) = G0 and H(θ ∗ ) = H0
Rǫu (τ )
⇐⇒ S ∈ M

System Identification 110 System Identification 111


' Situation B
We observe:
$ ' Situation C
We observe:
$
Rǫ (τ )6=σe2 δ(τ ) Rǫ (τ )6=σe2 δ(τ )
Rǫu (τ ) = 0 ∀ τ ∃τ s.t. Rǫu (τ )6=0

This situation occurs when This situation occurs when


G0 − G(θ ∗ ) H0 6=0
ǫ(t, θ ∗ ) = u(t) + e(t)
H(θ ∗ ) H(θ ∗ )
z }| {
G0 − G(θ ∗ ) H0
6=1 ǫ(t, θ ∗ ) = u(t) + e(t)
z }| { H(θ ∗ ) H(θ ∗ )
H0

& % & %
= 0 × u(t) + e(t) ⇐⇒ G(θ ∗ ) 6= G0
H(θ ∗ ) 
 either S 6∈ M with G ∈ G for M ARX or ARMAX
⇐⇒ G(θ ) = G0 and H(θ ) 6= H0
∗ ∗ 0
⇐⇒
 or S 6∈ M with G0 6∈ G

' $ ' $
⇐⇒ S 6∈ M with G0 ∈ G for M OE, BJ or FIR

System Identification 112 System Identification 113

Conclusions for the asymptotic case:


2) M is chosen as ARX or ARMAX:
1) M is chosen as OE, FIR or BJ:
Situations A and C can occur for Rǫ (τ ) and Rǫu (τ )
Situations A, B and C can occur for Rǫ (τ ) and Rǫu (τ ) By determining in which situations we are, we verify whether
By determining in which situations we are, we verify whether the identification of θ ∗ has been performed in a M such that
the identification of θ ∗ has been performed in a M such that • S ∈ M (situation A)
• S ∈ M (situation A) • S 6∈ M (situation C)

& % & %
• S 6∈ M with G0 ∈ G (situation B)
No distinction can be made between G0 ∈ G and G0 6∈ G
• or S 6∈ M with G0 6∈ G (situation C)

System Identification 114 System Identification 115


' 11.3 Model structure validation in the practical case N < ∞
$ ' $
The identified parameter vector is θ̂N which is an unbiased
estimate of θ ∗
What do these 99%-confidence regions represent ?
Model structure validation is performed by considering R̂ǫN (τ )
N
and R̂ǫu (τ ) of ǫ(t, θ̂N ): w.p.
R̂ǫN (τ ) lies in its confidence region ∀τ =⇒ Rǫ (τ ) = σe2 δ(τ )

PN −τ N w.p.
N
(τ ) = 1 R̂ǫu (τ ) lies in its confidence region ∀τ =⇒ Rǫu (τ ) = 0 ∀τ
R̂ǫu N t=1 ǫ(t + τ, θ̂N )u(t)
N −τ

& % & %
1
R̂ǫN (τ ) =
P
N t=1 ǫ(t + τ, θ̂N )ǫ(t, θ̂N )

' $ ' $
and by considering 99%-confidence regions for these estimates

System Identification 116 System Identification 117

Based on the results of the asymptotic case, we can therefore


deduce:

To construct these confidence regions, we use the following 1) when M is OE, FIR, or BJ
result:
both R̂ǫN (τ ) and R̂ǫu
N
(τ ) are in their confidence regions ∀τ
√ R̂N w.p.
ǫ (τ ) =⇒ S ∈ M
if Rǫ (τ ) = σe2 δ(τ ), then N R̂N
∼ AsN (0, 1).
ǫ (0)

√ N
if Rǫu (τ ) = 0 ∀ τ , then N
N R̂ǫu (τ ) ∼ AsN (0, P ) with an R̂ǫu (τ ) is in its confidence regions ∀τ while R̂ǫN (τ ) is not
w.p.
estimable P . completely in its confidence region =⇒ S 6∈ M with G0 ∈ G

& % & both R̂ǫN (τ ) and R̂ǫu

%
N
(τ ) are not completely in their confidence
w.p.
regions =⇒ S 6∈ M with G0 6∈ G

System Identification 118 System Identification 119


' $ ' $
2) when M is ARX or ARMAX

11.4 Example of how we can find a M s.t. S ∈ M


both R̂ǫN (τ ) and R̂ǫu
N
(τ ) are in their confidence regions ∀τ
w.p.
=⇒ S ∈ M
Let us consider an unknown true system S
w.p.
other cases =⇒ S 6∈ M
We would like to determine a model set M which contains S

& % & %
No distinction can be made between G0 ∈ G and G0 6∈ G

'
System Identification

First analysis of the system


Let us apply a step input signal u(t) to S and observe y(t)
$ '
120 System Identification

$
121

step response: u(t) (red) and y(t) (blue)


Collection of the data for the identification and determination
80
of M
70

60

50 We have applied a white noise input signal to S and collected


40
N = 5000 input-output data =⇒ Z N
30

20

10
Based on the first analysis of S, first choice for M:
0

M = BJ (nb = 2, nc = 2, nd = 2, nf = 2, nk = 3)

& % & %
−10
0 20 40 60 80 100 120 140 160 180 200
t

From this behaviour, we can conclude that G0 has a limited We can identify a parameter vector θ̂N in this M using Z N
order and from a detailed observation, we see that the delay is
nk = 3

System Identification 122 System Identification 123


' Does this M contain the true system S ?
Let us perform the model structure validation (Matlab function:
$ ' $
resid)

Correlation function of residuals. Output y1


1.2

0.8
Let us increase the order for G(z, θ) and H(z, θ)
0.6

0.4

0.2

0 M = BJ (nb = 3, nc = 3, nd = 3, nf = 3, nk = 3)
−0.2
0 5 10 15 20 25
lag

Cross corr. function between input u1 and residuals from output y1


0.5 and identify θ̂N in this new model structure using the same data
0.4

0.3
ZN
0.2

& % & %
0.1

−0.1

−0.2
−25 −20 −15 −10 −5 0 5 10 15 20 25
lag

' $ ' $
w.p.
=⇒ S 6∈ M with G0 6∈ G

System Identification 124 System Identification 125

Let us perform the model structure validation of this new M:


Correlation function of residuals. Output y1
1

0.8

0.6

0.4

0.2 A third order H(z, θ) is thus not sufficient to describe H0 (z).


0

−0.2 Let us try:


−0.4
0 5 10 15 20 25
lag

Cross corr. function between input u1 and residuals from output y1


0.04
M = BJ (nb = 3, nc = 4, nd = 4, nf = 3, nk = 3)
0.02

−0.02
and identify θ̂N in this new model structure using the data Z N

& % & %
−0.04
−25 −20 −15 −10 −5 0 5 10 15 20 25
lag

w.p.
=⇒ S 6∈ M with G0 ∈ G

System Identification 126 System Identification 127


' Let us perform the model structure validation of this new M:
$ ' $
Correlation function of residuals. Output y1
1.2

0.8

0.6

0.4

0.2

0
By a simple iteration, we can find a model set M that has the
−0.2
0 5 10
lag
15 20 25 property S ∈ M
Cross corr. function between input u1 and residuals from output y1
0.04

0.02

Note: the used S was indeed BJ(3,4,4,3,3) !!


0

−0.02

& % & %
−0.04
−25 −20 −15 −10 −5 0 5 10 15 20 25
lag

w.p.
=⇒ S ∈ M

'
System Identification

11.5 Final remarks.


$ '
128 System Identification

12 A typical procedure to identify a reliable full-order model


$
129

Model structure validation validates the hypothesis S ∈ M


For some type of systems, a reasonable objective can be to
based on the available data
identify reliable full-order models G(z, θ̂N ) and H(z, θ̂N ) of G0
and H0
Other data can be used for the validation than for the
identification
To reach this objective:

Model structure validation is often called model validation


Model structure validation allows to determine a model set M
such that S ∈ M
However

& % & %
a successful model structure validation does not necessarily q
imply that G(z, θ̂N ) and H(z, θ̂N ) are close estimates of and cov(G(ejω , θ̂N )) allows one to verify whether G(z, θ̂N ) is
q
G0 (z) = G(z, θ0 ) and H0 (z) = H(z, θ0 ) (variance can be still close to G0 (and eventually cov(H(ejω , θ̂N )) for H(z, θ̂N ))
large !!!)

System Identification 130 System Identification 131


' $ ' $
=⇒ Typical iterative procedure

1. choose the input signal and collect Z N Possible additional tests for item 5:
2. choose a model structure M • simulation of the identified model
3. identification of the models G(z, θ̂N ) and H(z, θ̂N ) • observation of the poles and zeros of the identified models
4. Verify if S ∈ M. If it is the case, go to item 5. If not, go • comparison of the frequency response of the identified
to item 2 and choose another model structure M models with the ETFE (see later) and/or with the physical
equations.
q
5. Verify if cov(G(ejω , θ̂N )) (and eventually

& % & %
q
cov(H(ejω , θ̂N ))) are small. If not, go back to item 1. If
yes, stop

System Identification 132 System Identification 133


' 13 Identification in a low order model structure
$ ' • not necessary: for control, a low order model accurate in
$
the frequencies around the cross-over frequency is sufficient

Some real-life systems have a very large order (e.g. chemical =⇒


and industrial plants)
For that type of S,
For such plants, identifying a reliable full-order model is:
• not a good idea since cov(G(ejω , θ̂N )) will be typically • choose a reduced order M which is nevertheless sufficiently
very large rich to be able to represent the behaviour of the system in
the important frequency range

& % & %
n Φv (ω)
cov(G(ejω , θ̂N )) ≈ • perform the identification experiment in such a way that the
N Φu (ω)
identified model is a close estimate of S in the important
with n large and N , Φu (ω) limited frequency range

'
System Identification

$ '
134 System Identification

$
135

Considered problem: What is the influence of the experimental


conditions (choice of u(t), choice of N ) on the approximation
of G0 (z) by G(z, θ̂N ) when:
We restrict thus attention to:
v(t)
z }| {
S: y(t) = G0 (z)u(t) + H0 (z)e(t) • to the approximation of G0 by G(z, θ̂N )
• to Output Error (OE) model structure M (reason: easier
analysis)

& % & %
and M = {G(z, θ) ; H(z, θ) = 1} is an OE model structure
such that 6 ∃θ0 with G(z, θ0 ) = G0 (z)

System Identification 136 System Identification 137


' Reminder from before ....
$ ' $
θ̂N can be computed as in the case S ∈ M 13.1 Modeling error when S 6∈ M with G0 6∈ G

θ̂N is a random variable due to the stochastic disturbance v(t) the modeling error G0 (z) − G(z, θ̂N ) is decomposed into two
corrupting the data contributions:

θ̂N is distributed as N (θ ∗ , Pθ ) where θ ∗ is the solution of the  


G0 (z)−G(z, θ̂N ) = (G0 (z) − G(z, θ ∗ ))+ G(z, θ ∗ ) − G(z, θ̂N )
ideal identification criterion

& % & %
Pθ can not be determined analytically, but Pθ → 0 when
N → ∞ =⇒ θ̂N → θ ∗ w.p. 1 when N → ∞ Note: when S ∈ M, G0 (z) − G(z, θ ∗ ) = 0

' $ ' $
6 ∃θ0 with G(z, θ0 ) = G0 (z) =⇒ G(z, θ ∗ ) 6= G0 (z)

System Identification 138 System Identification 139

the two contributions and their source:


 
G0 (z)−G(z, θ̂N ) = (G0 (z) − G(z, θ ∗ ))+ G(z, θ ∗ ) − G(z, θ̂N )
Considered problem (rephrased): what is the influence of the
experimental conditions

• G0 − G(θ ∗ ) is called the bias error and is due to the fact • on the bias error
that S 6∈ M with G0 6∈ G; • on the variance error

& % & %
• G(θ ∗ ) − G(θ̂N ) is called the variance error and is due to
the fact that N < ∞

System Identification 140 System Identification 141


' 13.3 shaping the bias error G0 − G(θ ∗ )
$ ' M = OE =⇒
$
Recall we consider an OE model structure M ǫ(t, θ) = (G0 (z) − G(z, θ))u(t) + v(t)
13.3.1 a frequency domain expression of the bias error
G0 (ejω ) − G(ejω , θ ∗ )

=⇒
θ ∗ = arg min V̄ (θ)
θ

and
π

& % & %
Z
V̄ (θ) = Ēǫ(t, θ)2 ∗ 1
θ = arg min Φǫ (ω, θ)dω
Z π θ 2π
1 −π

= Φǫ (ω, θ)dω 1
Z π
2π −π = arg min |G0 (ejω ) − G(ejω , θ)|2 Φu (ω) + Φv (w)dω
θ 2π −π

' $ ' $
(Parseval; both expressions are equal to Rǫ (0))

System Identification 142 System Identification 143

Z π
1 Notes:
θ ∗ = arg min |G0 (ejω ) − G(ejω , θ)|2 Φu (ω) + Φv (w)dω
θ 2π −π

• the bias error is a function of the power spectrum Φu (ω) of


=⇒ the input signal used for the identification
• the bias obtained with a signal u(t) of spectrum Φu (ω) is
G(ejω , θ ∗ ) is the model minimizing the integrated quadratic the same as the bias obtained with spectrum αΦu (ω) (α a
error |G0 (ejω ) − G(ejω , θ)|2 with weighting function Φu (ω) scalar constant)
=⇒ • the absolute level of power has thus no influence on the

& % & %
the bias will be the smallest at those ω’s where Φu (ω) is bias error, but influences the variance error
relatively the largest

System Identification 144 System Identification 145


' 13.3.2 Another way to shape the bias error - off-line prefiltering
$ ' Proof:
$
If we use the data uF (t) and yF (t) for the identification, the
Given a filter L(z) and the data u(t) and y(t) collected from S
corresponding prediction error ǫF (t, θ) is

Filter u(t) and y(t) with L: ǫF (t, θ) = L(z)ǫ(t, θ)

uF (t) = L(z)u(t) and yF (t) = L(z)y(t) where ǫ(t, θ) is the prediction error if we would have used u(t)
and y(t)
Result:
If you use the data uF (t) and yF (t) for the identification, the Consequently,

& % & %
weighting function shaping the bias error is:
ΦǫF (ω, θ) = |L(eiω )|2 · Φǫ (ω, θ)
iω 2
W (ω) = Φu (ω)|L(e )|

' $ ' $
and therefore W (ω) = Φu (ω)|L(eiω )|2

System Identification 146 System Identification 147

13.5 Example

S: y(t) = G0 (z)u(t) + e(t)

13.4 shaping the variance error G(θ ∗ ) − G(θ̂N )

with G0 (z) 4th order with three delay


Analysis more difficult than in the case S ∈ M
We have to use a given set of data Z N (N = 5000) for the
identification where u is the sum of a white noise of variance 5
However we can nevertheless cautiously state that and three high-frequencies sinus of amplitude 10
• large Φu (ω) around ω =⇒ small variance error around ω
Objective: Using the given data, identify a good model

& % & %
• large N =⇒ small variance error
G(z, θ̂N ) for G0 (z) in the frequency range [0 0.7] in the
reduced order model structure:

M = OE(nb = 2, nf = 2, nk = 3)

System Identification 148 System Identification 149


' $ ' G0 (z) (red) and G(z, θ̂N ) (blue) identified with the filtered
$
data
Since Z N is given, the only degree of freedom we have is to use 1
From u1 to y1
10

a pre-filter L(z) to shape the bias error 10


0

Amplitude
We want a small bias error in the frequency range [0 0.7] =⇒ 10
−1

−2
10

choose L(z) such that |L(ejω )|2 Φu (ω) is relatively (much) −3


10
−2 −1 0 1

larger in the frequency range [0 0.7] than in [0.7 π] 10 10 10 10

200

=⇒ L(z) Butterworth low pass filter of order 7 and cut-off 100

Phase (degrees)
0
frequency 0.7rad/s −100

−200

& % & %
We filter u and y collected from S by this L and we obtain −300

filtered data with which we perform the identification in M −400


10
−2
10
−1

Frequency (rad/s)
0
10
1
10

' $ ' $
=⇒ G(z, θ̂N ) is OK

System Identification 150 System Identification 151

What if we do not use a pre-filter L ?

G0 (z) (red) and G(z, θ̂N ) (blue) identified with the data in Z N

1
From u1 to y1 13.6 What about a Box Jenkins model structure
10

10
0
The weighting function W (ω) for the bias error
Amplitude

10
−1
G0 (ejω ) − G(ejω , θ ∗ ) is then
−2
10

10
−3

10
−2 −1
10
0
10
1
10
Φu (ω)|L(eiω )|2
W (ω) =
200
|H(ejω , θ ∗ )|2
100
Phase (degrees)

the noise model H(ejω , θ ∗ ) influences the bias error of the


0

& % & %
−100

−200
G-model !!
−300

−400
−2 −1 0 1
10 10 10 10
Frequency (rad/s)

=⇒ G(z, θ̂N ) is KO

System Identification 152 System Identification 153


' $ ' $
General objective of ETFE

S: y(t) = G0 (z)u(t) + v(t)

We apply an input signal u(t) to S and we collect the


Part IV: Nonparametric Identification (ETFE) corresponding output for N time samples:

Z N = { y(t), u(t) | t = 0...(N − 1) }

& % & %
Based on these N time-domain data, we want to estimate the
frequency response G0 (ejω ) (amplitude and phase) of the true
plant transfer function

'
Sc4110: part IV

$ '
1 Sc4110: part IV

$
2

Empirical Transfer Function Estimate (ETFE)

Time-Domain data −→ Frequency-Domain data via (scaled)


Nonparametric identification is generally performed in order Fourier Transform
• to have a first idea of G0 (ejω ) N −1
1 X
{ u(t) | t = 0...(N − 1) } ←→ UN (ω) = √ u(t) e−jωt
• to determine the frequency band of interest N t=0

& % & %
N −1
1 X
{ y(t) | t = 0...(N − 1) } ←→ YN (ω) = √ y(t) e−jωt
N t=0

Sc4110: part IV 3 Sc4110: part IV 4


' $ ' Practical Aspects
$
All information contained in { u(t) | t = 0...(N − 1) } is
Estimate Ĝ(ejω ) of G0 (ejω ) contained in the elements of UN (ω) at the N 2
frequencies

ωk = N k, k = 0, 1, ... located in [0 π]

jω YN (ω)
Ĝ(ejω ) = |Ĝ(ejω )|ej∠Ĝ(e )
= Ĝ(ejω ) is therefore only computed at those frequencies ωk
UN (ω)

Special attention should be given when u(t) is a periodic signal


Ĝ(ejω ) can in theory be computed at each frequency ω ∈ [0 π] of fundamental frequency ω0

& % & %
for which UN (ω) 6= 0
The Fourier transform UN (ω) of such a signal is indeed only
significant at the (active) harmonics of ω0 . Ĝ(ejω ) will

' $ ' $
therefore only be computed at those harmonics.

Sc4110: part IV 5 Sc4110: part IV 6

Illustration

Input signal 1: a multisine of fundamental frequency ω0 = 100
≈ 0.06

G0 (z)
(power=0.5)
z }| {
 
z −3 0.103 + 0.181z −1
y(t) = u(t) + H0 e(t)
1 − 1.991z −1 + 2.203z −2 − 1.841z −3 + 0.894z −4
30
1 X
u(t) = √ sin(kω0 t)
30 k=1
with H0 = 1/den(G0 ) and e(t) a white noise disturbance of
variance σe2 = 0.1

& % & %
The ETFE is computed at the 30 harmonics of ω0 present in
We collect N = 10000 data on this true system subsequently u(t)
with two different input signals having the same
Pu = 0.5 = 5σe2

Sc4110: part IV 7 Sc4110: part IV 8


' 10
2
$ ' $
ETFE Modulus
1
10

0
10

−1
10
−2 −1 0
10 10 10
ω
2
10

Input signal 2: a white noise of variance 0.5


ETFE Modulus

1
10

0
10

N
10
−1 The ETFE is computed at all the 2
= 5000 frequencies ωk
−2 −1 0
10 10 10
ω

& % & %
Above plot: the ETFE at the 30 harmonics of ω0 ; Bottom plot:
the same with the frequency response of G0 (ejω ) (blue)
We see that the ETFE is a good estimate of G0 (ejω ) at the

' $ ' $
harmonics of ω0

Sc4110: part IV 9 Sc4110: part IV 10

2
10

1
10
ETFE Modulus

0
10

−1
10

−2
10
−2 −1 0
10 10 10
ω

2
10
How can we explain this?
1
10
ETEF Modulus

0
10

−1
10
For this purpose, we need to understand the statistical
−2
10
−2 −1 0
properties of the ETFE
10 10 10
ω

& % & %
Above plot: the ETFE at ωk ; Bottom plot: the same with the
frequency response of G0 (ejω ) (blue)
We see that the ETFE is an erratic and poor estimate of
G0 (ejω )

Sc4110: part IV 11 Sc4110: part IV 12


' $ ' Moreover,
$
there is no (cor)relation between the estimate at the frequency
Statistical properties of the ETFE ωk and the other frequencies i.e. ωk−1 , ωk+1 , ...

At one frequency ωk , the estimate Ĝ(ejωk ) is a random variable


Due to the stochastic noise v(t) corrupting the data Z N , the (asymptotically) distributed around G0 (ejωk )
ETFE Ĝ(ejω ) is a random variable i.e.

=⇒

& % & %
the ETFE is different at each experiment

the ETFE will be reliable if the variance of the estimates

' $ ' $
Ĝ(ejωk ) are small for all ωk

Sc4110: part IV 13 Sc4110: part IV 14

Explanation of the results in the illustration

Variance of the ETFE P30


Multisine: u(t) = √1 sin(kω0 t)
30 k=1

the variance cov(Ĝ(ejω )) = E|Ĝ(ejω ) − E Ĝ(ejω )|2 is given by:
The ETFE is only computed at the harmonics ωk = k ω0
! (k = 1...30) of ω0 .
jω |VN (ejω )|2
cov(Ĝ(e )) = E
|UN (ejω )|2
Property of |UN |2 at the harmonics ωk :

N A2k 10000
with VN (ω) defined as YN (ω) and UN (ω) |UN (ejωk )|2 =


& % & %
=
4 120
Φv (ω)
cov(Ĝ(ejω )) tends, for increasing values of N , to Φu (ω)

since the amplitude Ak of each sine is 1/ 30 and N = 10000

Sc4110: part IV 15 Sc4110: part IV 16


' What is the variance of the ETFE at the available frequencies
$ ' 2
u(t) white noise of variance σu = 0.5
$
ωk ?
N
The ETFE is computed at 2
= 5000 frequencies ωk

E|UN |2 = |UN |2 since u(t) is deterministic


Since N is large, the variance at the frequencies ωk can be
approximated by:
=⇒

Φv (ωk ) Φv (ωk ) Φv (ωk )


E |VN (ejω k )|2 cov(Ĝ(ejωk )) ≈

jω k Φv (ωk ) 120Φv (ωk ) = = = 2Φv (ωk )
cov(Ĝ(e )) = ≈ = Φu (ωk ) 2
σu 0.5
|UN (ejω k )|2 |UN (ejω k )|2 10000

& % & %
Since |UN |2 is proportional to N and A2k , the variance is
1 Unlike for a multisine u(t), the variance is not proportional to
proportional to N and A12 1
; variance only proportional to σ12

' $ ' $
k
N u

Sc4110: part IV 17 Sc4110: part IV 18

Suppose u(t) is not free to be chosen and is stochastic,


Multisine vs. stochastic signal
and that the power of u(t) cannot be increased
ETFE available at more frequencies for stochastic u(t)

For equal power, variance much smaller for multisine u(t) How can we then get a relatively good estimate? How can we

& % & %
reduce the variance ?

Sc4110: part IV 19 Sc4110: part IV 20


' Smoothing of ETFE through the use of windows
$ ' The averaging can be performed as follows:
$
Z π
only really relevant when u(t) is stochastic Wγ (ξ − ω)Ĝ(eiξ )dξ
Ĝsm (ejω ) =
−π
Z π
Principle: reduction of the variance by averaging over Wγ (ξ − ω)dξ
−π
neighbouring frequency points

Smoothing is motivated by: with Ĝ(ejω ) the unsmoothed ETFE and Wγ (ω) a positive

& % & %
• ETFE estimates are independent for different ωk ’s real-valued frequency-function (window)

• Averaging over a frequency area where G0 is constant


reduces the variance A Hamming window is generally chosen for Wγ (ω)

'
Sc4110: part IV

40
Hamming frequency window for different resolutions
$ '
21 Sc4110: part IV

The window is non zero in an interval [−∆ω, +∆ω] around 0.


$
22

35

30

25 The larger γ, the smaller ∆ω.


20

15 Z π
10 Wγ (ξ − ω)Ĝ(eiξ )dξ
Ĝsm (ejω ) =
−π
5
Z π
0
Wγ (ξ − ω)dξ
−π
−5
−1 −0.5 0 0.5 1
frequency (rad/sec)

& % & %
Wγ (ω) of Hamming window for γ = 10 (solid), γ = 20 Ĝsm (ejωk ) at a particular frequency ωk is obtained by averaging
(dash-dotted) and γ = 40 (dashed). Ĝ(ejω ) in the interval [ωk − ∆ω, ωk + ∆ω]
γ is measure for the width of the window.

Sc4110: part IV 23 Sc4110: part IV 24


' $ ' Illustration (cont’d): consequence of the use a too wide window
γ = 10 on the ETFE of slide 11
$
2
10

Smoothed ETFE Modulus


• Window introduced bias in an attempt to reduce the 1
10

variance (bias/variance trade-off) 0


10

−1
10
−2 −1 0
10 10 10
• Choice of window dependent on expected smoothness of ω
2

G0 (eiω )
10

Smoothed ETFE Modulus


1
10

0
10

• Window too narrow: variance too large

& % & %
−1
10
Window too wide: possible smoothing of dynamics −2
10
−1
10
0
10
ω

Above plot: the smoothed ETFE at ωk ; Bottom plot: the same

' $ ' $
with the frequency response of G0 (ejω ) (blue)

Sc4110: part IV 25 Sc4110: part IV 26

To find this way, note that

YN (ω)
Ĝ(ejω ) =
UN (ω)
Besides trial-and-error coupled with physical insights on G0 (z), YN (ω)UN

(ω)
=
UN (ω)UN

(ω)
is there another way to select γ? +∞
X
N
R̂yu (τ ) e−jωτ
τ =−∞
= +∞
X
N
R̂u (τ ) e−jωτ
Yes....

& % & %
τ =−∞

where the last step follows from expressions (3.13) and (3.19) in
the lecture note.

Sc4110: part IV 27 Sc4110: part IV 28


' +∞
X
N
R̂yu (τ ) e−jωτ
$ ' Interpretation
$
τ =−∞
Ĝ(ejω ) = SPA Φ̂yu (ω)
+∞
X Ĝ(ejω ) can thus be seen as the ratio Φ̂u (ω)
of the
N
R̂u (τ ) −jωτ
e ∆
τ =−∞
approximation Φ̂yu (ω) of Φyu (ω) = F (Ryu (τ )) and of the

approximation Φ̂u (ω) of Φu (ω) = F (Ru (τ )).
with

N −1
 1 This seems logical since
 X
u(t)u(t − τ ) f or |τ | < N − 1

N
R̂u (τ ) = N t=0

 0 Φyu (ω) G0 (ejω )Φu (ω)
f or |τ | > N − 1

= = G0 (ejω )
Φu (ω) Φu (ω)

& % & %

N −1
 1
 X The approximations of the spectra are obtained by taking the
y(t)u(t − τ ) f or 0 < τ < N − 1

N N
N
R̂yu (τ ) = N t=0 Fourier transforms of estimates R̂yu (τ ) and R̂u (τ ) of the exact

' $ ' $
correlation functions.

 0

elsewhere

Sc4110: part IV 29 Sc4110: part IV 30

Hamming lag-window wγ (τ )
1.2

1
Moreover it can be shown that
0.8
+∞
X
N
wγ (τ )R̂yu (τ ) e−jωτ 0.6

jω τ =−∞
Ĝsm (e )= +∞ 0.4
X
N
wγ (τ )R̂u (τ ) −jωτ
e
0.2
τ =−∞
0
with wγ (τ ) obtained as the inverse Fourier transform of the

& % & %
−0.2
frequency window Wγ (ω) 0 10 20 30 40 50 60 70

N
typical R̂yu (τ ) (solid) together with the Hamming lag-windows
w10 (τ ) (dotted), w30 (τ ) (dashed) and w70 (τ ) (dash-dotted).

Sc4110: part IV 31 Sc4110: part IV 32


' $ ' Illustration (cont’d):
N
$
wγ (τ ) is a window with width γ: wγ (τ ) = 0, |τ | > γ We compute R̂yu (τ ) with the data generated by the white
noise of variance 0.5
Smoothing corresponds thus to remove from the estimate 0.3

N 0.25

Φ̂yu (ω) of Φyu (ω) the elements of R̂yu (τ ) for τ > γ 0.2

0.15

0.1

estimated Ryu(τ)
This is relevant since Ryu (τ ) → 0 for τ → ∞ (G0 (z) stable) 0.05

N
and since the accuracy R̂yu (τ ) is smaller and smaller for 0

−0.05
N
increasing values of τ (R̂yu (τ ) computed with less data points) −0.1

−0.15

& % & %
−0.2
0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Method for the selection of γ: choose γ such that, for τ > γ, τ

N N
R̂yu (τ ) are small w.r.t |R̂yu (0)| and “less reliable”
N
we see the inaccuracy of the estimate: R̂yu (τ ) does not tend to

' $ ' $
0 for τ → ∞

Sc4110: part IV 33 Sc4110: part IV 34

Let us focus on the first 500 τ ’s Obtained smoothed ETFE with γ = 100
0.3 2
10

Smoothed ETFE Modulus


0.25
1
10

0.2
0
10
0.15

−1
0.1 10
estimated Ryu(τ)

−2 −1 0
10 10 10
0.05 ω
2
10
0

Smoothed ETFE Modulus


1
−0.05 10

−0.1 0
10

−0.15
−1
10

& % & %
−0.2 10
−2 −1
10
0
10
0 50 100 150 200 250 300 350 400 450 500
τ ω

N
we see that, after τ = 100, R̂yu (τ ) increases again which is Above plot: the smoothed ETFE at ωk ; Bottom plot: the same
much unlikely for Ryu (τ ) =⇒ we select γ = 100 with the frequency response of G0 (ejω ) (blue)

Sc4110: part IV 35 Sc4110: part IV 36


' $ ' $
Final remarks: drawbacks of ETFE

ETFE gives a discrete estimate of the frequency response of =⇒ parametric identification (prediction error identification)
G0 (ejω ) and not the rational transfer function G0 (z)

• delivers a model of the plant G0 and information on Φv (ω)


For simulation, for modern control design, such a transfer
function is necessary n Φv (ω)
• higher accuracy (cov(G(ejω , θ̂N ) ≈ N Φu (ω)
with PEI)

& % & %
No information about the noise spectrum Φv (ω) while this
information is important for e.g. disturbance rejection

'
Sc4110: part IV

Illustration (cont’d): Parametric Identification of G0 (z) with


the 1000 first samples of the white noise of variance 0.5
$ '
37 Sc4110: part IV

$
38

2
10

1
10
Modulus

0
10

the previous figure has to be compared with the non-smoothed


−1
10
−2
10
−1
10 10
0
ETFE and the smoothed ETFE
ω

2
10

1
10
This comparison shows that PEI delivers much better results
Modulus

0
10
even with ten times less data points

& % & %
−1
10
−2 −1 0
10 10 10
ω

Above plot: frequency response of the identified model; Bottom plot:


the same with the frequency response of G0 (ejω ) (blue)

Sc4110: part IV 39 Sc4110: part IV 40


' $ ' $
1 Preparatory experiments

• noise measurement on the output


• step response analysis
Part V: practical issues when designing
the identification experiment • area of linearity
• time constants
• static gain

& % & %
• delay of the system

Possibilities depend on circumstances

'
SC4110: Part V

2 Choice of the sampling frequency ωs = 2π


Ts
$ '
1 SC4110: Part V

$
2

Data for parametric (PEI) identification with smaller ωs


u x[n]
ZOH ucont Continuous ycont Sampling y
Ts system Ts
High ωs induces numerical problems with parametric
identification
Data for the ETFE with high(est) value of ωs
Indeed, the higher ωs , the larger the frequency range that is Indeed all poles cluster around z = 1 since the discrete-time
capured (Shannon theorem) state-space matrix Ad = eAcont Ts → I when Ts → 0

& % & %
ωs
The ETFE obtained with these data can be represented up to 2

By inspecting this ETFE, it is then possible to determine the


bandwidth ωb of the system (ωb << ω2s )

SC4110: Part V 3 SC4110: Part V 4


' $ ' Remark (actual vs. normalized frequencies):
$
Typical choice for parametric identification: The model of G0 identified with data collected with a sampling
frequency ωs contains information up to the Nyquist frequency
ωs
10ωb < ωs < 30ωb 2
(actual frequency)

with ωb as observed in the ETFE Considering now the normalized frequency ω = ωactual Ts

Data with a smaller ωs can be obtained We note that the interval [0 ω2s ] (actual frequencies)
• either by re-collecting data with a smaller ωs corresponds to the main interval [0 π] when considering
normalized frequencies. Indeed

& % & %
• or by decimating the data obtained with high ωs
(+anti-aliasing filter) ωs π
= =⇒ normalized ω = π
2 T
| {z s}

' $ ' $
actual f requency

SC4110: Part V 5 SC4110: Part V 6

3 Input signals used for system identification


Multisines

Finite-power quasi-stationary signals for continuous excitation n


X
u(t) = Ak sin(kω0 t + φk )
• periodic signals (in particular multisines)
k=1
• realization of stochastic process ((filtered) white noise or
alike)
Φu (ω) made up of Dirac pulses at the frequencies of the sines
Trade-off when designing the excitation signal
in the multisines
• the power Pu / Φu (ω) should be as high as possible to

& % & %
increase the accuracy of the identified model
the phase shifts φk can be optimized in order to reduce the
• the amplitude of the time-domain signal should be maximal amplitude of u(t) without any effect on the power
bounded/limited in order not to damage the actuators and spectrum Φu (ω)
in order not to excite the nonlinearities

SC4110: Part V 7 SC4110: Part V 8


' Realization of a stochastic process
$ ' Alternative: Random Binary Sequence (RBS)
$
   
u(t) = F (z)w(t) t
u(t) = c sign w int
ν
with F (z) an user-selected filter and w(t) a white noise of 2
2 with c the amplitude, w(t) a white noise of variance σw and ν
variance σw
the so-called clock period which is an integer such that 1 ≤ ν
The power spectrum is given by:

The amplitude of the RBS is either +c or −c


Φu (ω) = |F (ejω )|2 σw
2

& % & %
Shaping Φu (ω) is very easy, but there is no a-priori bound on The RBS has the maximal power Pu = Ēu2 (t) = c2 that can
the amplitude of u(t) !! be attained by a signal u(t) ≤ c ∀t

'
SC4110: Part V

c
$ '
9 SC4110: Part V

Influence of ν on Φu (ω)
$
10

(a)
1.6

t → 1.4

−c
1.2

c 0.8

(b)
0.6

t → 0.4

−c 0.2

& % & %
0
0 0.5 1 1.5 2 2.5 3 3.5
(a) Typical RBS with clock period equal to sampling interval
(ν = 1); 1
Spectrum 2π Φu (ω) of (P)RBS with basic clock period ν = 1
(b) RBS with increased clock period ν = 2.
(black), ν = 3 (green), ν = 5 (red), and ν = 10 (blue).

SC4110: Part V 11 SC4110: Part V 12


' $ ' $
The power spectrum Φu (ω) of the RBS is thus shaped via ν:

• ν = 1 =⇒ Φu (ω) = c2 ∀ω i.e. the RBS has the flat power


Another alternative: P(seudo)RBS
spectrum of a white noise

• binary signal constructed from a deterministic shift register


• For increasing values of ν, the power spectrum Φu (ω) will
be more and more located in low frequencies • otherwise very similar to RBS

& % & %
less flexibility, but bounded amplitude !!

'
SC4110: Part V

$ '
13 SC4110: Part V

$
14

5 Remarks on unstable systems

4 Data (pre)processing
Unstable systems can not be identified in open loop

• Anti-aliasing filter
Experiments has to be done with a stabilizing controller C in
• outliers/spike closed loop:
• Non-zero mean and drift in disturbances; detrending
G0 C H0

& % & %
y(t) = r(t) + e(t)
1 + G0 C 1 + G0 C

SC4110: Part V 15 SC4110: Part V 16


' $
G0 C H0
y(t) = r(t) + e(t)
1 + G0 C 1 + G0 C

Since r(t) is independent of e(t), we can excite the closed-loop


G0 C
system via r(t) and identify a model T̂ (z) of 1+G 0C

A model for the unstable G0 (z) is then

& %
T̂ (z)
Ĝ(z) =
C(z)(1 − T̂ (z))

SC4110: Part V 17

S-ar putea să vă placă și