Documente Academic
Documente Profesional
Documente Cultură
When digital signal processing operations are implemented on a computer or with special-purpose
hardware, the numbers and coefficients are stored in finite-length registers. The errors and
constraints due to finite word length are unavoidable. The coefficients and numbers are quantized
by truncation or rounding off when they are stored.
The main errors arise due to quantization of numbers are
a) Input quantization error
b) Product quantization error
c) Coefficient quantization error
Input quantization error : It is due to the conversion of a continuous-time signal into digital
value. This error arises due to the representation of the input signal by a fixed number of digits in
A/D conversion process
Product quantization error : Registers are the basic storage device in digital system. The
maximum size of the binary information that can be stored in a register is called register word
length. If a register stores an 8-bit data then its word length is 8-bit. While performing
calculations the size of the result may be exceeding the size of the register used for storing the
result. Example: Multiplication of a b bit data with a b bit coefficient results a product having 2b
bits. Since a b bit register is used, the multiplier output must be truncated or rounded to
accommodate the result in the register which produces Product quantization error. This error at
the output of the multiplier.
Coefficient quantization error : The filter coefficients are computed to infinite precision in
theory. The quantization of the filter coefficients has the effect of disturbing the location of the
filter poles and zeros. Therefore, the frequency response of the resulting filter may differ from the
desired response and sometimes the filter may fail to meet the desired specifications. If the poles
of the desired filter are close to the unit circle, then those of the filter with quantized coefficients
may lie just outside the unit circle, leading the instability. This deterministic frequency response
error is referred to as coefficient quantization error.
Representation of a Number
N
The number N can be represented to any desired accuracy by the following finite series
i A
r i or N
n2
i n1
di=ith digit of the number. The binary digit d-A is the MSB and dB is the LSB of the binary number N
r=10, Decimal number representation,
Fixed point representation: In this the digits allotted for integer part and fraction part are fixed, so
the position of the binary point is fixed. The bits to the right of binary point represent the fractional
part of the number and to the left of binary point represent the integer part.
Ex1: 978.125 (978 Integer part and 125 Fractional part)
In fixed point representation there are three different formats for representing negative binary
fraction numbers. They are
a. Signmagnitude form
Signmagnitude form: The most significant bit is set to 1 to represent the negative sign.
Ex:
(0.875)10 = (0.111000)2
(1.75)10 = (01.110000)2
(-0.875)10 = (1.111000)2
(-1.75)10 = (11.110000)2
In sign magnitude form, the number 0 has two representations, i.e., 00.000000 or 10.000000.
With b bits only (2b-1) numbers can be represented.
Ones-complement form: The positive number is represented as in the sign-magnitude form. The
negative number is obtained by complementing all the bits of the positive number.
Ex:
(0.875)10 = (0.111000)2
(-0.875)10 = (1.000111)2
C
i 1
2 i 2 b
In this type of representation with b bits only (2b-1) numbers can be represented exactly.
Twos-Complement form: The positive numbers are represented as in sign-magnitude and ones
complement form. The negative number is obtained by complementing all the bits of the positive
number and adding one to the LSB.
Ex:
(0.875)10 = (0.111000)2
(-0.875)10 = (1.001000)2
i
The magnitude of the negative number is given by 1 Ci 2
i 1
1.
The accuracy of the result is less due to The accuracy of the result will be higher due
small dynamic range
to larger dynamic range
3.
2.
4.
5.
6.
7.
Quantization error
multiplication
occurs
only
with
both
Truncation : Truncation is the process of reducing the size of the binary number by discarding all
bits less significant that the least significant bit that is retained.
C
i 1
2i
N
C
i 1
2i
xT x Ci 2 i 0
iN
Since the magnitude increases with truncation, which implies that the error is negative and
satisfy the inequality 0 xT x 2 b
C
i 1
2i 2 b
N
C
i 1
2i 2 N
xT x Ci 2 i 2 N 2 b 0
b
iN
Since the magnitude decreases with truncation, which implies that the error is positive and
satisfy the inequality 0 xT x 2 b
In Sign magnitude representation the magnitude decreases with truncation, which implies that
the error is positive. Therefore, the above inequality condition holds for Sign magnitude
representation.
When the number is truncated to N bits, xT 2 M T (only the mantissa is truncated to N bits)
c
b c
xT x
x
e
x
xT x
x
e
x
or
0 e 2 b 2c
e x e 2c M
with M = 1/2, the maximum range of relative error is for positive M is 0 2.2
0 e 2 b 2c
with M = 1/2, the maximum range of relative error is for Negative M is 0 2.2
is the same as positive M
which
Rounding : Rounding is the process of reducing the size of a binary number to finite word size of
b-bits such that the rounded b-bit number is closest to the original unquantized number.
The rounding process consists of Truncation and Addition. In rounding of a number to b-bits,
first the unquantized number is truncated to b-bits by retaining the most significant b-bits. Then a
zero or one is added to the least significant bit of the truncated number depending on the bit that is
next to the least significant bit that is retained.
If the bit next to the least significant bit that is retained is zero then zero is added to the least
significant bit of the truncated number. If the bit next to the least significant bit that is retained is one
then one is added to the least significant bit of the truncated number. (Here adding one is called
rounding up).
Rounding up or down will have negligible effect on accuracy of computation.
In fixed point arithmetic the error due to rounding a number to b bits produce an error
2 b
2b
. This is because with rounding, if
xT x
2
2
value lies half way between two levels, it can be approximated to either nearest higher level or by the
nearest lower level. For fixed point number
2 b
2b
satisfies regardless of whether
xT x
2
2
sign-magnitude, Ones complement and Twos complement used for negative numbers.
When the number is quantized to N bits, xT 2 M T (only the mantissa is rounded to N bits)
c
q
q
e n
2
2
-------- (1)
------- (2)
b
2 b
c 2
using equations (1) , equation (2) can be written as 2
xT x 2
2
2
c
or
since x 2 M
c
b
2 b
c 2
2
x 2
2
2
c
-----(3)
------- (4)
2b
2 b
M
then equation (4) becomes
2
2
2 b
6
Input Quantization Error: The input quantization error arises when a continuous signal is
converted into digital value. The A/D converter consists of Sampler and Quantizer.
The sampler sampled the analog signal x(t) at regular intervals t=nT to produce a sequence of
unquantized values x(n).
The quantizer quantizes the analog values (unquantized values of x(n)) and produce the
corresponding binary codes.
If ADC is used to convert the sinusoidal signal it employ (b+1)bits including sign. The number
of levels available of quantizing x(n) is 2b+1.
The interval between successive level is q
2
2 b , where q is quantization step size.
b1
2
The errors produced by A/D conversion process are Quantization error and Saturation error.
Quantization error: It is due to the representation of the sampled signal by a fixed number of digits
Saturation error: It occurs when the analog signal exceeds the dynamic range of A/D converter.
Let
In A/D converters quantization can be performed by Truncation and Rounding. But the
quantization by rounding is preferred in A/D converters due to zero mean value of quantization error
and low variance when compared to truncation.
The quantization error for rounding of a number satisfies the relation
q
q
e n
2
2
The quantization error for truncating a number, in twos complement representation the error
is always negative and satisfies the inequality q e n 0
We assume that the A/D conversion error e(n) has the following properties
Case 1:
If rounding is used for quantization then the quantization error e(n) = xq(n) - x(n) is bounded
by
q
q
e n
2
2
The error e(n) lies between q/2 and +q/2 with equal probability.
For a uniform distributed random variable X in the interval (X1, X2) the expected value (mean
value) and variance is given by
1
Expected value or mean value is E X
X 2 X1
E X
Variance E X
2
e
X2
X dx
X1
q
2
2
2
1
1 e n
1 q q
E e n
e n de
0
q q q
q
2
2
q
q
2
2 2
2
2 2
2
3
3
3
2
1
1 e n 2
1 q q
q2
2
2
e n de 0
q q q
q
3
3
q
12
q
2 2
2
2
2 2
12
2 2 b
12
If Truncation is used for quantization then the quantization error e(n) = xq(n) - x(n) is
bounded by q e n 0
In twos complement truncation the error e(n) lies between 0 and q.
q
2
8
3
0
1
1
3
q2
q2 q2 q2 q2
q 1 e n
2
e n de
q
0 q q
3q
2 q 3 q 4
4 3 4 12
0
2
e
12
2 2 b
12
22 b
In both cases
which is also known as steady state noise power due to input quantization.
12
2
e
If the input signal is x(n) and its variance is e , then the ratio of signal power to noise power is
2
x2 x2
2b
2
which is known as signal to noise ratio for rounding is 2 2 b 12 2 x
e 2
12
Steady State Output Noise (Variance) Power
The quantized input signal of a digital system can be represented as a sum of unquantized
signal x(n) and error signal e(n)
h(n) is the impulse response of the system and y(n) is the response of the system due to an
error signal. The response of the system is given by convolution of input and impulse response.
y' (n) = xq(n) * h(n)
Let
The variance of the signal (n) is called output noise power or steady state output noise power
due to the quantization error signal.
h n
2
n 0
Using parsevals theorem the steady state output noise variance due to the quantization error
e2
H z H z 1 z 1 dz
is given by h n
2 j c
n 0
2
e
1
h n 2 j H z H z z
Prove that
n 0
dz
integral
h2(n)
H z h n z n
is z h
formula
Z 1 H z h n
- - - - - (1)
n 0
n h n h n z
n 0
for
1
2 j
z h 2 n h n h n z n
n 0
the
H z z
inverse
n 1
dz
2 j H z z
n 0
h n z
2
z-transform
dz h n z n
1
h n z z 1dz
c H z H z z dz
n 0
1
h n
2 j
n 0
- - - - - (5)
h2 n
n 0
h2 n
n 0
1
2 j
1
2 j
h n z
1
n 0
H z H z z
1
- - - - - (2)
n 0
- - - - - (3)
n 1
is
given
by
- - - - - (4)
h n z
n 0
H z 1
1
1
H
z
h
n
z
c
n 0
z dz - - - - - (6)
H z 1
- - - - - (7)
10
Limit cycles: When a stable IIR filter is excited by a finite input sequence, the output will ideally
decay to zero. However, the nonlinearities due to finite precision arithmetic operations cause periodic
oscillations in the output. These oscillations are called limit cycles. (OR)
In recursive system, the nonlinearities due to the finite-precision arithmetic operations often
cause periodic oscillations to occur in the output, even when the input sequence is zero or some
nonzero constant value. Such oscillations in recursive systems are called limit cycles and are directly
attributable to round-off errors in multiplications and overflow errors in addition.
The limit cycles occur as a result of the quantization effects in multiplications.
Types of limit cycles: Zero input limit cycles and overflow limit cycles.
Zero input limit cycle:
( )=
( )+
Let us assume =1/2 and the data register length is 3 bits plus a sign bit.
The input is ( ) =
n
0
1
2
3
4
5
0.875
0
x(n)
y(n-1)
0.875
0.0
0
7/8
0
1/2
0
1/4
0
1/8
0
1/8
=0
y(n-1)
0.0
7/16
1/4
1/8
1/16
1/16
Q[ y(n-1)]
0.000
0.100
0.010
0.001
0.001
0.001
( )=
( 1)
( )+ [
7/8
1/2
1/4
1/8
1/8
1/8
( 1)]
The rounding is applied after the arithmetic operation. For n 3 the output remains constant and gives
1/8 as steady output causing limit cycle behaviour.
From the table it can be observed that for zero input, the unquantized y(n) decays exponentially to
zero with increasing n. However, the rounded-off (quantized) output y(n) gets stuck at a value of 1/8
and never decays further. Thus output is finite even when no input is applied. This is referred to as
Zero input limit cycle effect.
Let us assume =-1/2
n
0
1
2
3
4
5
6
x(n)
0.875
0
0
0
0
0
0
y(n-1)
0.0
7/8
-1/2
1/4
-1/8
1/8
-1/8
y(n-1)
0.0
-7/16
1/4
-1/8
1/16
-1/16
1/16
Q[ y(n-1)]
0.000
1.100
0.010
1.001
0.001
1.001
0.001
( )=
( )+ [
7/8
-1/2
1/4
-1/8
1/8
-1/8
1/8
( 1)]
11
Dead Band: The amplitude of the output during a limit cycle is confined to a range of values and this
range of value is called the dead band.
Let
us
consider
single
pole IIR
( )=
| |
( 1)] =
( 1)]
( 1)] + ( )
)
equation
is
given
by
( 1)|
Overflow Limit Cycle Oscillations: In fixed point addition the flow occurs when the sum exceeds
the finite word length of the register used to store the sum. The overflow in addition makes the
output to oscillate between maximum and minimum amplitudes. Such limit cycles are called overflow
limit cycle oscillations.
The overflow in addition of two or more binary numbers occurs when the sum exceeds the
word size available in the digital implementation of the system.
The overflow occurs when the sum exceeds the dynamic range of the number systems. When
the binary fraction format is used for computing, the dynamic range is (-1,1).
Let us consider two positive numbers +3/8 and +5/8 in twos complement addition
(+3/8) + (+5/8) 0.011 + 0.101 = 1.000 (-8/8) = -1
The actual sum is +1 but due to overflow the sum is wrongly interpreted as a negative number.
The overflow limit cycle oscillations can be eliminated if saturation arithmetic is performed. In
saturation arithmetic, when an overflow is sensed, the output is set equal to maximum allowable
value and when an underflow in sensed, the output is set equal to minimum allowable value.
The saturation arithmetic causes undesirable signal distortion due to the nonlinearity of the
clipper.
How overflow limit cycles can be eliminated:
The overflow limit cycles can be eliminated either by using saturation arithmetic or by scaling the
input signal to the adder.
The study of limit cycle oscillations is important for two reasons.
1. In a communication environment, when no signal is transmitted, limit cycles can occur which
are extremely undesirable.
Example: In a telephone no one would like to hear unwanted noise when no signal is put in
from the other end. Consequently, when digital filters are used in telephone exchanges, care
must be taken regarding this problem.
2. The limit cycles effect can be effectively used in digital waveform generators. By producing
desirable limit cycles in a reliable manner, these limit cycles can be used as a source in digital
signal processing.
12
24
25
0.245 0.245z 1
. If it is realized by using direct form-II
1 0.509z 1
structure, find the scaling factor to avoid overflow in the 1st adder of realization.
An LTI system is characterized by the difference equation, y(n)=0.68y(n-1)+0.15x(n). The input
signal x(n) has a range of -5V to +5V, represented by 8-bits. Find the quantization step size,
variance of the error signal and variance of the quantization noise at the output.
The T/F of a discrete time filter is H (z )
13